SystemGetCrossValidationResults (Analysis Services - Data Mining)
Partitions the mining structure into the specified number of cross-sections, trains a model for each partition, and then returns accuracy metrics for each partition.
Note
This stored procedure cannot be used to cross-validate clustering models, or models that are built by using the Microsoft Time Series algorithm or the Microsoft Sequence Clustering algorithm. To cross-validate clustering models, you can use the separate stored procedure, SystemGetClusterCrossValidationResults (Analysis Services - Data Mining).
Syntax
SystemGetCrossValidationResults(
<mining structure>
[, <mining model list>]
,<fold count>
,<max cases>
,<target attribute>
[,<target state>]
[,<target threshold>]
[,<test list>])
Arguments
mining structure
Name of a mining structure in the current database.(required)
mining model list
Comma-separated list of mining models to validate.If a model name contains any characters that are not valid in the name of an identifier, the name must be enclosed in brackets.
If a list of mining models is not specified, cross-validation is performed against all models that are associated with the specified structure and that contain a predictable attribute.
Note
To cross-validate clustering models, you must use a separate stored procedure, SystemGetClusterCrossValidationResults (Analysis Services - Data Mining).
(optional)
fold count
Integer that specifies the number of partitions into which to separate the data set. The minimum value is 2. The maximum number of folds is maximum integer or the number of cases, whichever is lower.Each partition will contain roughly this number of cases: max cases/fold count.
There is no default value.
Note
The number of folds greatly affects the time that is required to perform cross-validation. If you select a number that is too high, the query might run for a very long time, and in some cases the server can become unresponsive or time out.
(required)
max cases
Integer that specifies the maximum number of cases that can be tested across all folds.A value of 0 indicates that all the cases in the data source will be used.
If you specify a value that is greater than the actual number of cases in the data set, all cases in the data source will be used.
There is no default value.
(required)
target attribute
String that contains the name of the predictable attribute. A predictable attribute can be a column, nested table column, or nested table key column of a mining model.Note
The existence of the target attribute is validated only at run time.
(required)
target state
Formula that specifies the value to predict. If a target value is specified, metrics are collected for the specified value only.If a value is not specified or is null, the metrics are computed for the most probable state for each prediction.
The default is null.
An error is raised during validation if the specified value is not valid for the specified attribute, or if the formula is not the correct type for the specified attribute.
(optional)
target threshold
Double greater than 0 and less than 1. Indicates the minimum probability score that must be obtained for the prediction of the specified target state to be counted as correct.A prediction that has a probability less than or equal to this value is considered incorrect.
If no value is specified or is null, the most probable state is used, regardless of its probability score.
The default is null.
Note
Analysis Services will not raise an error if you set state threshold to 0.0, but you should never use this value. In effect, a threshold of 0.0 means that predictions with a 0 percent probability are counted as correct.
(optional)
test list
A string that specifies testing options.Note This parameter is reserved for future use.
(optional)
Return Type
The rowset that is returned contains scores for each partition in each model.
The following table describes the columns in the rowset.
Column Name |
Description |
---|---|
ModelName |
The name of the model that was tested. |
AttributeName |
The name of the predictable column. |
AttributeState |
A specified target value in the predictable column. If this value is null, the most probable prediction was used. If this column contains a value, the accuracy of the model is assessed against this value only. |
PartitionIndex |
An 1-based index that identifies to which partition the results apply. |
PartitionSize |
An integer that indicates how many cases were included in each partition. |
Test |
Category of the test that was performed. For a description of the categories and the tests that are included in each category, see Cross-Validation Report (Analysis Services - Data Mining). |
Measure |
The name of the measure returned by the test. Measures for each model depend on the type of the predictable value. For a definition of each measure, see Cross-Validation (Analysis Services - Data Mining). For a list of measures returned for each predictable type, see Cross-Validation Report (Analysis Services - Data Mining). |
Value |
The value of the specified test measure. |
Remarks
To return accuracy metrics for the complete data set, use SystemGetAccuracyResults (Analysis Services - Data Mining).
If the mining model has already been partitioned into folds, you can bypass processing and return only the results of cross-validation by using SystemGetAccuracyResults (Analysis Services - Data Mining).
Examples
The following example demonstrates how to partition a mining structure for cross-validation into two folds, and then test two mining models that are associated with the mining structure, [v Target Mail].
Line three of the code lists the mining models that you want to test. If you do not specify the list, all non-clustering models associated with the structure are used. Line four of the code specifies the number of partitions. Because no value is specified for max cases, all cases in the mining structure are used and distributed evenly across the partitions.
Line five specifies the predictable attribute, Bike Buyer, and line six specifies the value to predict, 1 (meaning "yes, will buy").
The NULL value in line seven indicates that there is no minimum probability bar that must be met. Therefore, the first prediction that has a non-zero probability will be used in assessing accuracy.
CALL SystemGetCrossValidationResults(
[v Target Mail],
[Target Mail DT], [Target Mail NB],
2,
'Bike Buyer',
1,
NULL
)
Sample results:
ModelName |
AttributeName |
AttributeState |
PartitionIndex |
PartitionSize |
Test |
Measure |
Value |
---|---|---|---|---|---|---|---|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
True Positive |
144 |
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
False Positive |
105 |
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
True Negative |
186 |
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
False Negative |
65 |
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Likelihood |
Log Score |
-0.619042807138345 |
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Likelihood |
Lift |
0.0740963734002671 |
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Likelihood |
Root Mean Square Error |
0.346946279977653 |
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
True Positive |
162 |
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
False Positive |
86 |
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
True Negative |
165 |
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
False Negative |
87 |
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Likelihood |
Log Score |
-0.654117781086519 |
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Likelihood |
Lift |
0.038997399132084 |
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Likelihood |
Root Mean Square Error |
0.342721344892651 |
Requirements
Cross-validation is available only in SQL Server Enterprise beginning with SQL Server 2008.