Share via


NormalizingEstimator Class

Definition

public sealed class NormalizingEstimator : Microsoft.ML.IEstimator<Microsoft.ML.Transforms.NormalizingTransformer>
type NormalizingEstimator = class
    interface IEstimator<NormalizingTransformer>
Public NotInheritable Class NormalizingEstimator
Implements IEstimator(Of NormalizingTransformer)
Inheritance
NormalizingEstimator
Implements

Remarks

Estimator Characteristics

Does this estimator need to look at the data to train its parameters? Yes
Input column data type Single or Double or a known-sized vector of those types.
Output column data type The same data type as the input column
Exportable to ONNX Yes

The resulting NormalizingEstimator will normalize the data in one of the following ways based upon how it was created:

  • Min Max - A linear rescale that is based upon the minimum and maximum values for each row.
  • Mean Variance - Rescale each row to unit variance and, optionally, zero mean.
  • Log Mean Variance - Rescale each row to unit variance, optionally, zero mean based on computations in log scale.
  • Binning - Bucketizes the data in each row and performs a linear rescale based on the calculated bins.
  • Supervised Binning - Bucketize the data in each row and performas a linear rescale based on the calculated bins. The bin calculation is based on correlation of the Label column.
  • Robust Scaling - Optionally centers the data and scales based on the range of data and the quantile min and max values provided. This method is more robust to outliers.

Estimator Details

The interval of the normalized data depends on whether fixZero is specified or not. fixZero defaults to true. When fixZero is false, the normalized interval is $[0,1]$ and the distribution of the normalized values depends on the normalization mode. For example, with Min Max, the minimum and maximum values are mapped to 0 and 1 respectively and remaining values fall in between. When fixZero is set, the normalized interval is $[-1,1]$ with the distribution of the normalized values depending on the normalization mode, but the behavior is different. With Min Max, the distribution depends on how far away the number is from 0, resulting in the number with the largest distance being mapped to 1 if its a positive number or -1 if its a negative number. The distance from 0 will affect the distribution with a majority of numbers that are closer together normalizing towards 0. Robust Scaling does not use fixZero, and its values are not constrained to $[0,1]$ or $[-1,1]$. Its scaling is based on the range of the data and the quantile min and max provided.

The equation for the output $y$ of applying both Mean Variance and Log Mean Variance on input $x$ without using the CDF option is: $y = (x - \text{offset}) \text{scale}$. Where offset and scale are computed during training.

Using the CDF option it is: $y = 0.5 * (1 + \text{ERF}((x - \text{mean}) / (\text{standard deviation} * sqrt(2)))$. Where ERF is the Error Function used to approximate the CDF of a random variable assumed to normally distributed. The mean and standard deviation are computing during training.

To create this estimator use one of the following:

Check the above links for usage examples.

Methods

Fit(IDataView)

Trains and returns a NormalizingTransformer.

GetOutputSchema(SchemaShape)

Returns the SchemaShape of the schema which will be produced by the transformer. Used for schema propagation and verification in a pipeline.

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to