TextCatalog.FeaturizeText 方法

参考

定义

命名空间:: Microsoft.ML

程序集:: Microsoft.ML.Transforms.dll

包:: Microsoft.ML v4.0.1

包:: Microsoft.ML v1.0.0

包:: Microsoft.ML v1.1.0

包:: Microsoft.ML v1.2.0

包:: Microsoft.ML v1.3.1

包:: Microsoft.ML v1.4.0

包:: Microsoft.ML v1.5.5

包:: Microsoft.ML v1.6.0

包:: Microsoft.ML v1.7.0

包:: Microsoft.ML v2.0.1

包:: Microsoft.ML v3.0.1

包:: Microsoft.ML v5.0.0-preview.1.25125.4

重要

一些信息与预发行产品相关，相应产品在发行之前可能会进行重大修改。对于此处提供的信息，Microsoft 不作任何明示或暗示的担保。

重载

FeaturizeText(TransformsCatalog+TextTransforms, String, String)	创建一个 TextFeaturizingEstimator，它将文本列转换为特征化向量，该矢量 Single 表示 n 元语法和 char-gram 的规范化计数。
FeaturizeText(TransformsCatalog+TextTransforms, String, TextFeaturizingEstimator+Options, String[])	创建一个 TextFeaturizingEstimator，它将文本列转换为特征化向量 Single ，表示 n 元克和字符克的规范化计数。

FeaturizeText(TransformsCatalog+TextTransforms, String, String)

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

创建一个 TextFeaturizingEstimator，它将文本列转换为特征化向量，该矢量 Single 表示 n 元语法和 char-gram 的规范化计数。

public static Microsoft.ML.Transforms.Text.TextFeaturizingEstimator FeaturizeText(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default);

static member FeaturizeText : Microsoft.ML.TransformsCatalog.TextTransforms * string * string -> Microsoft.ML.Transforms.Text.TextFeaturizingEstimator

<Extension()>
Public Function FeaturizeText (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing) As TextFeaturizingEstimator

参数

catalog: TransformsCatalog.TextTransforms

与文本相关的转换的目录。

outputColumnName: String

由转换 inputColumnName生成的列的名称。此列的数据类型将是一 Single个向量。

inputColumnName: String

要转换的列的名称。 If set to null, the value of the outputColumnName will be used as source. 此估算器对文本数据进行操作。

TextFeaturizingEstimator

示例

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class FeaturizeText
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create a small dataset as an IEnumerable.
            var samples = new List<TextData>()
            {
                new TextData(){ Text = "ML.NET's FeaturizeText API uses a " +
                    "composition of several basic transforms to convert text " +
                    "into numeric features." },

                new TextData(){ Text = "This API can be used as a featurizer to " +
                    "perform text classification." },

                new TextData(){ Text = "There are a number of approaches to text " +
                    "classification." },

                new TextData(){ Text = "One of the simplest and most common " +
                    "approaches is called “Bag of Words”." },

                new TextData(){ Text = "Text classification can be used for a " +
                    "wide variety of tasks" },

                new TextData(){ Text = "such as sentiment analysis, topic " +
                    "detection, intent identification etc." },
            };

            // Convert training data to IDataView.
            var dataview = mlContext.Data.LoadFromEnumerable(samples);

            // A pipeline for converting text into numeric features.
            // The following call to 'FeaturizeText' instantiates 
            // 'TextFeaturizingEstimator' with default parameters.
            // The default settings for the TextFeaturizingEstimator are
            //      * StopWordsRemover: None
            //      * CaseMode: Lowercase
            //      * OutputTokensColumnName: None
            //      * KeepDiacritics: false, KeepPunctuations: true, KeepNumbers:
            //          true
            //      * WordFeatureExtractor: NgramLength = 1
            //      * CharFeatureExtractor: NgramLength = 3, UseAllLengths = false
            // The length of the output feature vector depends on these settings.
            var textPipeline = mlContext.Transforms.Text.FeaturizeText("Features",
                "Text");

            // Fit to data.
            var textTransformer = textPipeline.Fit(dataview);

            // Create the prediction engine to get the features extracted from the
            // text.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(textTransformer);

            // Convert the text into numeric features.
            var prediction = predictionEngine.Predict(samples[0]);

            // Print the length of the feature vector.
            Console.WriteLine($"Number of Features: {prediction.Features.Length}");

            // Print the first 10 feature values.
            Console.Write("Features: ");
            for (int i = 0; i < 10; i++)
                Console.Write($"{prediction.Features[i]:F4}  ");

            //  Expected output:
            //   Number of Features: 332
            //   Features: 0.0857  0.0857  0.0857  0.0857  0.0857  0.0857  0.0857  0.0857  0.0857  0.1715 ...
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public float[] Features { get; set; }
        }
    }
}

适用于

FeaturizeText(TransformsCatalog+TextTransforms, String, TextFeaturizingEstimator+Options, String[])

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

创建一个 TextFeaturizingEstimator，它将文本列转换为特征化向量 Single ，表示 n 元克和字符克的规范化计数。

public static Microsoft.ML.Transforms.Text.TextFeaturizingEstimator FeaturizeText(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, Microsoft.ML.Transforms.Text.TextFeaturizingEstimator.Options options, params string[] inputColumnNames);

static member FeaturizeText : Microsoft.ML.TransformsCatalog.TextTransforms * string * Microsoft.ML.Transforms.Text.TextFeaturizingEstimator.Options * string[] -> Microsoft.ML.Transforms.Text.TextFeaturizingEstimator

<Extension()>
Public Function FeaturizeText (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, options As TextFeaturizingEstimator.Options, ParamArray inputColumnNames As String()) As TextFeaturizingEstimator

参数

catalog: TransformsCatalog.TextTransforms

与文本相关的转换的目录。

outputColumnName: String

由转换 inputColumnNames生成的列的名称。此列的数据类型将是一 Single个向量。

options: TextFeaturizingEstimator.Options

算法的高级选项。

inputColumnNames: String[]

要转换的列的名称。 If set to null, the value of the outputColumnName will be used as source. 此估算器对文本数据进行操作，并且可以一次转换多个列，从而生成一个作为所有列的结果特征的向量 Single 。

TextFeaturizingEstimator

示例

using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Transforms.Text;

namespace Samples.Dynamic
{
    public static class FeaturizeTextWithOptions
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create a small dataset as an IEnumerable.
            var samples = new List<TextData>()
            {
                new TextData(){ Text = "ML.NET's FeaturizeText API uses a " +
                "composition of several basic transforms to convert text into " +
                "numeric features." },

                new TextData(){ Text = "This API can be used as a featurizer to " +
                "perform text classification." },

                new TextData(){ Text = "There are a number of approaches to text " +
                "classification." },

                new TextData(){ Text = "One of the simplest and most common " +
                "approaches is called “Bag of Words”." },

                new TextData(){ Text = "Text classification can be used for a " +
                "wide variety of tasks" },

                new TextData(){ Text = "such as sentiment analysis, topic " +
                "detection, intent identification etc." },
            };

            // Convert training data to IDataView.
            var dataview = mlContext.Data.LoadFromEnumerable(samples);

            // A pipeline for converting text into numeric features.
            // The following call to 'FeaturizeText' instantiates
            // 'TextFeaturizingEstimator' with given parameters. The length of the
            // output feature vector depends on these settings.
            var options = new TextFeaturizingEstimator.Options()
            {
                // Also output tokenized words
                OutputTokensColumnName = "OutputTokens",
                CaseMode = TextNormalizingEstimator.CaseMode.Lower,
                // Use ML.NET's built-in stop word remover
                StopWordsRemoverOptions = new StopWordsRemovingEstimator.Options()
                {
                    Language = TextFeaturizingEstimator.Language.English
                },

                WordFeatureExtractor = new WordBagEstimator.Options()
                {
                    NgramLength
                    = 2,
                    UseAllLengths = true
                },

                CharFeatureExtractor = new WordBagEstimator.Options()
                {
                    NgramLength
                    = 3,
                    UseAllLengths = false
                },
            };
            var textPipeline = mlContext.Transforms.Text.FeaturizeText("Features",
                options, "Text");

            // Fit to data.
            var textTransformer = textPipeline.Fit(dataview);

            // Create the prediction engine to get the features extracted from the
            // text.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(textTransformer);

            // Convert the text into numeric features.
            var prediction = predictionEngine.Predict(samples[0]);

            // Print the length of the feature vector.
            Console.WriteLine($"Number of Features: {prediction.Features.Length}");

            // Print feature values and tokens.
            Console.Write("Features: ");
            for (int i = 0; i < 10; i++)
                Console.Write($"{prediction.Features[i]:F4}  ");

            Console.WriteLine("\nTokens: " + string.Join(",", prediction
                .OutputTokens));

            //  Expected output:
            //   Number of Features: 282
            //   Features: 0.0941  0.0941  0.0941  0.0941  0.0941  0.0941  0.0941  0.0941  0.0941  0.1881 ...
            //   Tokens: ml.net's,featurizetext,api,uses,composition,basic,transforms,convert,text,numeric,features.
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public float[] Features { get; set; }
            public string[] OutputTokens { get; set; }
        }
    }
}

注解

此转换可以对多个列进行操作。

适用于

通过

TextCatalog.FeaturizeText 方法

定义

重载

FeaturizeText(TransformsCatalog+TextTransforms, String, String)

参数

返回

示例

适用于

FeaturizeText(TransformsCatalog+TextTransforms, String, TextFeaturizingEstimator+Options, String[])

参数

返回

示例

注解

适用于

其他资源