TextCatalog.RemoveStopWords 方法

参考

定义

命名空间:: Microsoft.ML

程序集:: Microsoft.ML.Transforms.dll

包:: Microsoft.ML v3.0.1

包:: Microsoft.ML v1.0.0

包:: Microsoft.ML v1.1.0

包:: Microsoft.ML v1.2.0

包:: Microsoft.ML v1.3.1

包:: Microsoft.ML v1.4.0

包:: Microsoft.ML v1.5.5

包:: Microsoft.ML v1.6.0

包:: Microsoft.ML v1.7.0

包:: Microsoft.ML v2.0.0

重要

一些信息与预发行产品相关，相应产品在发行之前可能会进行重大修改。对于此处提供的信息，Microsoft 不作任何明示或暗示的担保。

创建一个 CustomStopWordsRemovingEstimator，它将数据从指定 inputColumnName 列复制到新列： outputColumnName 并删除其中指定 stopwords 的文本。

public static Microsoft.ML.Transforms.Text.CustomStopWordsRemovingEstimator RemoveStopWords (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, params string[] stopwords);

static member RemoveStopWords : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * string[] -> Microsoft.ML.Transforms.Text.CustomStopWordsRemovingEstimator

<Extension()>
Public Function RemoveStopWords (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, ParamArray stopwords As String()) As CustomStopWordsRemovingEstimator

参数

catalog: TransformsCatalog.TextTransforms

转换的目录。

outputColumnName: String

由转换 inputColumnName生成的列的名称。此列的数据类型将是文本的可变大小向量。

inputColumnName: String

要从中复制数据的列的名称。此估算器在文本向量上运行。

stopwords: String[]

要删除的单词数组。

CustomStopWordsRemovingEstimator

示例

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class RemoveStopWords
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create an empty list as the dataset. The 'RemoveStopWords' does not
            // require training data as the estimator
            // ('CustomStopWordsRemovingEstimator') created by 'RemoveStopWords' API
            // is not a trainable estimator. The empty list is only needed to pass
            // input schema to the pipeline.
            var emptySamples = new List<TextData>();

            // Convert sample list to an empty IDataView.
            var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples);

            // A pipeline for removing stop words from input text/string.
            // The pipeline first tokenizes text into words then removes stop words.
            // The 'RemoveStopWords' API ignores casing of the text/string e.g. 
            // 'tHe' and 'the' are considered the same stop words.
            var textPipeline = mlContext.Transforms.Text.TokenizeIntoWords("Words",
                "Text")
                .Append(mlContext.Transforms.Text.RemoveStopWords(
                "WordsWithoutStopWords", "Words", stopwords:
                new[] { "a", "the", "from", "by" }));

            // Fit to data.
            var textTransformer = textPipeline.Fit(emptyDataView);

            // Create the prediction engine to remove the stop words from the input
            // text /string.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(textTransformer);

            // Call the prediction API to remove stop words.
            var data = new TextData()
            {
                Text = "ML.NET's RemoveStopWords API " +
                "removes stop words from tHe text/string using a list of stop " +
                "words provided by the user."
            };

            var prediction = predictionEngine.Predict(data);

            // Print the length of the word vector after the stop words removed.
            Console.WriteLine("Number of words: " + prediction.WordsWithoutStopWords
                .Length);

            // Print the word vector without stop words.
            Console.WriteLine("\nWords without stop words: " + string.Join(",",
                prediction.WordsWithoutStopWords));

            //  Expected output:
            //   Number of words: 14
            //   Words without stop words: ML.NET's,RemoveStopWords,API,removes,stop,words,text/string,using,list,of,stop,words,provided,user.
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public string[] WordsWithoutStopWords { get; set; }
        }
    }
}

适用于

通过

TextCatalog.RemoveStopWords 方法

定义

参数

返回

示例

适用于

其他资源