Delen via


WordEmbeddingEstimator Class

Definition

Text featurizer which converts vectors of text tokens into a numerical vector using a pre-trained embeddings model.

public sealed class WordEmbeddingEstimator : Microsoft.ML.IEstimator<Microsoft.ML.Transforms.Text.WordEmbeddingTransformer>
type WordEmbeddingEstimator = class
    interface IEstimator<WordEmbeddingTransformer>
Public NotInheritable Class WordEmbeddingEstimator
Implements IEstimator(Of WordEmbeddingTransformer)
Inheritance
WordEmbeddingEstimator
Implements

Remarks

Estimator Characteristics

Does this estimator need to look at the data to train its parameters? No
Input column data type Vector of Text
Output column data type Known-sized vector of Single
Exportable to ONNX No

The WordEmbeddingTransformer produces a new column, named as specified in the output column name parameters, where each input vector is mapped to a numerical vector with size of 3 * dimensionality of the embedding model used. Notice that this is independent of the size of the input vector.

For example, when using GloVe50D, which itself is 50 dimensional, the output column is a vector of size 150. The first third of slots contains the minimum values across the embeddings corresponding to each string in the input vector. The second third contains the average of the embeddings. The last third of slots contains maximum values of the encountered embeddings. The min/max provides a bounding hyper-rectangle for the words in the word embedding space. This can assist for longer phrases where the average of many words drowns out the useful signal.

The user can specify a custom pre-trained embeddings model or one of the available pre-trained models. The available options are various versions of GloVe Models, FastText, and SSWE.

Check the See Also section for links to usage examples.

Methods

Fit(IDataView)

Trains and returns a WordEmbeddingTransformer.

GetOutputSchema(SchemaShape)

Returns the SchemaShape of the schema which will be produced by the transformer. Used for schema propagation and verification in a pipeline.

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also