Dela via


SystemGetCrossValidationResults (Analysis Services - Data Mining)

Partitions the mining structure into the specified number of cross-sections, trains a model for each partition, and then returns accuracy metrics for each partition.

Note

This stored procedure cannot be used to cross-validate clustering models, or models that are built by using the Microsoft Time Series algorithm or the Microsoft Sequence Clustering algorithm. To cross-validate clustering models, you can use the separate stored procedure, SystemGetClusterCrossValidationResults (Analysis Services - Data Mining).

Syntax

SystemGetCrossValidationResults(
<mining structure>
[, <mining model list>]
,<fold count>
,<max cases>
,<target attribute>
[,<target state>]
[,<target threshold>]
[,<test list>])

Arguments

  • mining structure
    Name of a mining structure in the current database.

    (required)

  • mining model list
    Comma-separated list of mining models to validate.

    If a model name contains any characters that are not valid in the name of an identifier, the name must be enclosed in brackets.

    If a list of mining models is not specified, cross-validation is performed against all models that are associated with the specified structure and that contain a predictable attribute.

    Note

    To cross-validate clustering models, you must use a separate stored procedure, SystemGetClusterCrossValidationResults (Analysis Services - Data Mining).

    (optional)

  • fold count
    Integer that specifies the number of partitions into which to separate the data set. The minimum value is 2. The maximum number of folds is maximum integer or the number of cases, whichever is lower.

    Each partition will contain roughly this number of cases: max cases/fold count.

    There is no default value.

    Note

    The number of folds greatly affects the time that is required to perform cross-validation. If you select a number that is too high, the query might run for a very long time, and in some cases the server can become unresponsive or time out.

    (required)

  • max cases
    Integer that specifies the maximum number of cases that can be tested across all folds.

    A value of 0 indicates that all the cases in the data source will be used.

    If you specify a value that is greater than the actual number of cases in the data set, all cases in the data source will be used.

    There is no default value.

    (required)

  • target attribute
    String that contains the name of the predictable attribute. A predictable attribute can be a column, nested table column, or nested table key column of a mining model.

    Note

    The existence of the target attribute is validated only at run time.

    (required)

  • target state
    Formula that specifies the value to predict. If a target value is specified, metrics are collected for the specified value only.

    If a value is not specified or is null, the metrics are computed for the most probable state for each prediction.

    The default is null.

    An error is raised during validation if the specified value is not valid for the specified attribute, or if the formula is not the correct type for the specified attribute.

    (optional)

  • target threshold
    Double greater than 0 and less than 1. Indicates the minimum probability score that must be obtained for the prediction of the specified target state to be counted as correct.

    A prediction that has a probability less than or equal to this value is considered incorrect.

    If no value is specified or is null, the most probable state is used, regardless of its probability score.

    The default is null.

    Note

    Analysis Services will not raise an error if you set state threshold to 0.0, but you should never use this value. In effect, a threshold of 0.0 means that predictions with a 0 percent probability are counted as correct.

    (optional)

  • test list
    A string that specifies testing options.

    Note   This parameter is reserved for future use.

    (optional)

Return Type

The rowset that is returned contains scores for each partition in each model.

The following table describes the columns in the rowset.

Column Name

Description

ModelName

The name of the model that was tested.

AttributeName

The name of the predictable column.

AttributeState

A specified target value in the predictable column. If this value is null, the most probable prediction was used.

If this column contains a value, the accuracy of the model is assessed against this value only.

PartitionIndex

An 1-based index that identifies to which partition the results apply.

PartitionSize

An integer that indicates how many cases were included in each partition.

Test

Category of the test that was performed. For a description of the categories and the tests that are included in each category, see Cross-Validation Report (Analysis Services - Data Mining).

Measure

The name of the measure returned by the test. Measures for each model depend on the type of the predictable value. For a definition of each measure, see Cross-Validation (Analysis Services - Data Mining).

For a list of measures returned for each predictable type, see Cross-Validation Report (Analysis Services - Data Mining).

Value

The value of the specified test measure.

Remarks

To return accuracy metrics for the complete data set, use SystemGetAccuracyResults (Analysis Services - Data Mining).

If the mining model has already been partitioned into folds, you can bypass processing and return only the results of cross-validation by using SystemGetAccuracyResults (Analysis Services - Data Mining).

Examples

The following example demonstrates how to partition a mining structure for cross-validation into two folds, and then test two mining models that are associated with the mining structure, [v Target Mail].

Line three of the code lists the mining models that you want to test. If you do not specify the list, all non-clustering models associated with the structure are used. Line four of the code specifies the number of partitions. Because no value is specified for max cases, all cases in the mining structure are used and distributed evenly across the partitions.

Line five specifies the predictable attribute, Bike Buyer, and line six specifies the value to predict, 1 (meaning "yes, will buy").

The NULL value in line seven indicates that there is no minimum probability bar that must be met. Therefore, the first prediction that has a non-zero probability will be used in assessing accuracy.

CALL SystemGetCrossValidationResults(
[v Target Mail],
[Target Mail DT], [Target Mail NB],
2,
'Bike Buyer',
1,
NULL
)

Sample results:

ModelName

AttributeName

AttributeState

PartitionIndex

PartitionSize

Test

Measure

Value

Target Mail DT

Bike Buyer

1

1

500

Classification

True Positive

144

Target Mail DT

Bike Buyer

1

1

500

Classification

False Positive

105

Target Mail DT

Bike Buyer

1

1

500

Classification

True Negative

186

Target Mail DT

Bike Buyer

1

1

500

Classification

False Negative

65

Target Mail DT

Bike Buyer

1

1

500

Likelihood

Log Score

-0.619042807138345

Target Mail DT

Bike Buyer

1

1

500

Likelihood

Lift

0.0740963734002671

Target Mail DT

Bike Buyer

1

1

500

Likelihood

Root Mean Square Error

0.346946279977653

Target Mail DT

Bike Buyer

1

2

500

Classification

True Positive

162

Target Mail DT

Bike Buyer

1

2

500

Classification

False Positive

86

Target Mail DT

Bike Buyer

1

2

500

Classification

True Negative

165

Target Mail DT

Bike Buyer

1

2

500

Classification

False Negative

87

Target Mail DT

Bike Buyer

1

2

500

Likelihood

Log Score

-0.654117781086519

Target Mail DT

Bike Buyer

1

2

500

Likelihood

Lift

0.038997399132084

Target Mail DT

Bike Buyer

1

2

500

Likelihood

Root Mean Square Error

0.342721344892651

Requirements

Cross-validation is available only in SQL Server Enterprise beginning with SQL Server 2008.