TextCatalog.NormalizeText 方法

參考

定義

命名空間:: Microsoft.ML

組件:: Microsoft.ML.Transforms.dll

套件:: Microsoft.ML v4.0.1

套件:: Microsoft.ML v1.0.0

套件:: Microsoft.ML v1.1.0

套件:: Microsoft.ML v1.2.0

套件:: Microsoft.ML v1.3.1

套件:: Microsoft.ML v1.4.0

套件:: Microsoft.ML v1.5.5

套件:: Microsoft.ML v1.6.0

套件:: Microsoft.ML v1.7.0

套件:: Microsoft.ML v2.0.1

套件:: Microsoft.ML v3.0.1

套件:: Microsoft.ML v5.0.0-preview.1.25125.4

來源:: TextCatalog.cs

來源:: TextCatalog.cs

來源:: TextCatalog.cs

重要

部分資訊涉及發行前產品，在發行之前可能會有大幅修改。 Microsoft 對此處提供的資訊，不做任何明確或隱含的瑕疵擔保。

建立， TextNormalizingEstimator 它會選擇性地變更大小寫、移除讀音符號、標點符號、數位，並將新文字輸出為 outputColumnName ，以將傳入文字 inputColumnName 正規化。

public static Microsoft.ML.Transforms.Text.TextNormalizingEstimator NormalizeText(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, Microsoft.ML.Transforms.Text.TextNormalizingEstimator.CaseMode caseMode = Microsoft.ML.Transforms.Text.TextNormalizingEstimator+CaseMode.Lower, bool keepDiacritics = false, bool keepPunctuations = true, bool keepNumbers = true);

static member NormalizeText : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * Microsoft.ML.Transforms.Text.TextNormalizingEstimator.CaseMode * bool * bool * bool -> Microsoft.ML.Transforms.Text.TextNormalizingEstimator

<Extension()>
Public Function NormalizeText (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional caseMode As TextNormalizingEstimator.CaseMode = Microsoft.ML.Transforms.Text.TextNormalizingEstimator+CaseMode.Lower, Optional keepDiacritics As Boolean = false, Optional keepPunctuations As Boolean = true, Optional keepNumbers As Boolean = true) As TextNormalizingEstimator

參數

catalog: TransformsCatalog.TextTransforms

與文字相關的轉換目錄。

outputColumnName: String

轉換 inputColumnName 所產生的資料行名稱。此資料行的資料類型是純量或文字向量，視輸入資料行資料類型而定。

inputColumnName: String

要轉換的資料行名稱。如果設定為 null ，則會 outputColumnName 將的值當做來源使用。此估算器會在文字或文字資料類型的向量上運作。

caseMode: TextNormalizingEstimator.CaseMode

使用不變異文化特性的規則來大小寫文字。

keepDiacritics: Boolean

是否要保留聽寫標記或移除它們。

keepPunctuations: Boolean

是否要保留標點符號或移除標點符號。

keepNumbers: Boolean

是否保留數位或將其移除。

傳回

TextNormalizingEstimator

範例

using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Transforms.Text;

namespace Samples.Dynamic
{
    public static class NormalizeText
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create an empty list as the dataset. The 'NormalizeText' API does not
            // require training data as the estimator ('TextNormalizingEstimator')
            // created by 'NormalizeText' API is not a trainable estimator. The
            // empty list is only needed to pass input schema to the pipeline.
            var emptySamples = new List<TextData>();

            // Convert sample list to an empty IDataView.
            var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples);

            // A pipeline for normalizing text.
            var normTextPipeline = mlContext.Transforms.Text.NormalizeText(
                "NormalizedText", "Text", TextNormalizingEstimator.CaseMode.Lower,
                keepDiacritics: false,
                keepPunctuations: false,
                keepNumbers: false);

            // Fit to data.
            var normTextTransformer = normTextPipeline.Fit(emptyDataView);

            // Create the prediction engine to get the normalized text from the
            // input text/string.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(normTextTransformer);

            // Call the prediction API.
            var data = new TextData()
            {
                Text = "ML.NET's NormalizeText API " +
                "changes the case of the TEXT and removes/keeps diâcrîtîcs, " +
                "punctuations, and/or numbers (123)."
            };

            var prediction = predictionEngine.Predict(data);

            // Print the normalized text.
            Console.WriteLine($"Normalized Text: {prediction.NormalizedText}");

            //  Expected output:
            //   Normalized Text: mlnets normalizetext api changes the case of the text and removeskeeps diacritics punctuations andor numbers
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public string NormalizedText { get; set; }
        }
    }
}

適用於

共用方式為

TextCatalog.NormalizeText 方法

定義

參數

傳回

範例

適用於

其他資源