ClusteringCatalog.CrossValidate Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Run cross-validation over numberOfFolds
folds of data
, by fitting estimator
,
and respecting samplingKeyColumnName
if provided.
Then evaluate each sub-model against labelColumnName
and return metrics.
public System.Collections.Generic.IReadOnlyList<Microsoft.ML.TrainCatalogBase.CrossValidationResult<Microsoft.ML.Data.ClusteringMetrics>> CrossValidate (Microsoft.ML.IDataView data, Microsoft.ML.IEstimator<Microsoft.ML.ITransformer> estimator, int numberOfFolds = 5, string labelColumnName = default, string featuresColumnName = default, string samplingKeyColumnName = default, int? seed = default);
member this.CrossValidate : Microsoft.ML.IDataView * Microsoft.ML.IEstimator<Microsoft.ML.ITransformer> * int * string * string * string * Nullable<int> -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.TrainCatalogBase.CrossValidationResult<Microsoft.ML.Data.ClusteringMetrics>>
Public Function CrossValidate (data As IDataView, estimator As IEstimator(Of ITransformer), Optional numberOfFolds As Integer = 5, Optional labelColumnName As String = Nothing, Optional featuresColumnName As String = Nothing, Optional samplingKeyColumnName As String = Nothing, Optional seed As Nullable(Of Integer) = Nothing) As IReadOnlyList(Of TrainCatalogBase.CrossValidationResult(Of ClusteringMetrics))
Parameters
- data
- IDataView
The data to run cross-validation on.
- estimator
- IEstimator<ITransformer>
The estimator to fit.
- numberOfFolds
- Int32
Number of cross-validation folds.
- labelColumnName
- String
Optional label column for evaluation (clustering tasks may not always have a label).
- featuresColumnName
- String
Optional features column for evaluation (needed for calculating Dbi metric)
- samplingKeyColumnName
- String
Name of a column to use for grouping rows. If two examples share the same value of the samplingKeyColumnName
,
they are guaranteed to appear in the same subset (train or test). This can be used to ensure no label leakage from the train to the test set.
If null
no row grouping will be performed.
Seed for the random number generator used to select rows for cross-validation folds.