TextCatalog.LatentDirichletAllocation 메서드

참조

정의

네임스페이스:: Microsoft.ML

어셈블리:: Microsoft.ML.Transforms.dll

패키지:: Microsoft.ML v4.0.1

패키지:: Microsoft.ML v1.0.0

패키지:: Microsoft.ML v1.1.0

패키지:: Microsoft.ML v1.2.0

패키지:: Microsoft.ML v1.3.1

패키지:: Microsoft.ML v1.4.0

패키지:: Microsoft.ML v1.5.5

패키지:: Microsoft.ML v1.6.0

패키지:: Microsoft.ML v1.7.0

패키지:: Microsoft.ML v2.0.1

패키지:: Microsoft.ML v3.0.1

패키지:: Microsoft.ML v5.0.0-preview.1.25125.4

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

중요

일부 정보는 릴리스되기 전에 상당 부분 수정될 수 있는 시험판 제품과 관련이 있습니다. Microsoft는 여기에 제공된 정보에 대해 어떠한 명시적이거나 묵시적인 보증도 하지 않습니다.

LightLDA를 LatentDirichletAllocationEstimator사용하여 텍스트(부동 소수 민족의 벡터로 표시됨)를 식별된 각 항목과 텍스트의 유사성을 나타내는 벡터 Single 로 변환하는 을 만듭니다.

public static Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator LatentDirichletAllocation(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfTopics = 100, float alphaSum = 100, float beta = 0.01, int samplingStepCount = 4, int maximumNumberOfIterations = 200, int likelihoodInterval = 5, int numberOfThreads = 0, int maximumTokenCountPerDocument = 512, int numberOfSummaryTermsPerTopic = 10, int numberOfBurninIterations = 10, bool resetRandomGenerator = false);

static member LatentDirichletAllocation : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * single * single * int * int * int * int * int * int * int * bool -> Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator

<Extension()>
Public Function LatentDirichletAllocation (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfTopics As Integer = 100, Optional alphaSum As Single = 100, Optional beta As Single = 0.01, Optional samplingStepCount As Integer = 4, Optional maximumNumberOfIterations As Integer = 200, Optional likelihoodInterval As Integer = 5, Optional numberOfThreads As Integer = 0, Optional maximumTokenCountPerDocument As Integer = 512, Optional numberOfSummaryTermsPerTopic As Integer = 10, Optional numberOfBurninIterations As Integer = 10, Optional resetRandomGenerator As Boolean = false) As LatentDirichletAllocationEstimator

매개 변수

catalog: TransformsCatalog.TextTransforms

변환의 카탈로그입니다.

outputColumnName: String

의 변환에서 생성된 열의 inputColumnName이름입니다. 이 추정기는 벡터를 Single출력합니다.

inputColumnName: String

변환할 열의 이름입니다. 이 값으로 null설정하면 값이 outputColumnName 원본으로 사용됩니다. 이 추정기는 .의 Single벡터에서 작동합니다.

numberOfTopics: Int32

항목 수입니다.

alphaSum: Single

문서 토픽 벡터에 앞서 Dirichlet을 실행합니다.

beta: Single

어휘 주제 벡터에 앞서 Dirichlet.

samplingStepCount: Int32

메트로폴리스 헤이스팅스 단계의 수입니다.

maximumNumberOfIterations: Int32

반복 횟수입니다.

likelihoodInterval: Int32

이 반복 간격에서 로컬 데이터 세트에 대한 로그 가능성을 계산합니다.

numberOfThreads: Int32

학습 스레드 수입니다. 기본값은 논리 프로세서 수에 따라 달라집니다.

maximumTokenCountPerDocument: Int32

문서당 최대 토큰 수 임계값입니다.

numberOfSummaryTermsPerTopic: Int32

토픽을 요약할 단어 수입니다.

numberOfBurninIterations: Int32

번인 반복 횟수입니다.

resetRandomGenerator: Boolean

각 문서에 대한 난수 생성기를 다시 설정합니다.

반환

LatentDirichletAllocationEstimator

예제

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class LatentDirichletAllocation
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create a small dataset as an IEnumerable.
            var samples = new List<TextData>()
            {
                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "computes topic models." },

                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "is the best for topic models." },

                new TextData(){ Text = "I like to eat broccoli and bananas." },
                new TextData(){ Text = "I eat bananas for breakfast." },
                new TextData(){ Text = "This car is expensive compared to last " +
                "week's price." },

                new TextData(){ Text = "This car was $X last week." },
            };

            // Convert training data to IDataView.
            var dataview = mlContext.Data.LoadFromEnumerable(samples);

            // A pipeline for featurizing the text/string using 
            // LatentDirichletAllocation API. o be more accurate in computing the
            // LDA features, the pipeline first normalizes text and removes stop
            // words before passing tokens (the individual words, lower cased, with
            // common words removed) to LatentDirichletAllocation.
            var pipeline = mlContext.Transforms.Text.NormalizeText("NormalizedText",
                "Text")
                .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens",
                    "NormalizedText"))
                .Append(mlContext.Transforms.Text.RemoveDefaultStopWords("Tokens"))
                .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))
                .Append(mlContext.Transforms.Text.ProduceNgrams("Tokens"))
                .Append(mlContext.Transforms.Text.LatentDirichletAllocation(
                    "Features", "Tokens", numberOfTopics: 3));

            // Fit to data.
            var transformer = pipeline.Fit(dataview);

            // Create the prediction engine to get the LDA features extracted from
            // the text.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(transformer);

            // Convert the sample text into LDA features and print it.
            PrintLdaFeatures(predictionEngine.Predict(samples[0]));
            PrintLdaFeatures(predictionEngine.Predict(samples[1]));

            // Features obtained post-transformation.
            // For LatentDirichletAllocation, we had specified numTopic:3. Hence
            // each prediction has been featurized as a vector of floats with length
            // 3.

            //  Topic1  Topic2  Topic3
            //  0.6364  0.2727  0.0909
            //  0.5455  0.1818  0.2727
        }

        private static void PrintLdaFeatures(TransformedTextData prediction)
        {
            for (int i = 0; i < prediction.Features.Length; i++)
                Console.Write($"{prediction.Features[i]:F4}  ");
            Console.WriteLine();
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public float[] Features { get; set; }
        }
    }
}

적용 대상

다음을 통해 공유

TextCatalog.LatentDirichletAllocation 메서드

정의

매개 변수

반환

예제

적용 대상

추가 리소스