TextCatalog.ProduceWordBags 方法
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
重载
ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32) |
创建一个 WordBagEstimator,它将中指定的 |
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria) |
创建一个 WordBagEstimator,它将中指定的 |
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria) |
创建一个 WordBagEstimator,用于将 中指定的 |
ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)
创建一个 WordBagEstimator,它将中指定的 inputColumnName
列映射到名为 outputColumnName
的新列中的 n 个克计数的向量。
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, char termSeparator, char freqSeparator, string inputColumnName = default, int maximumNgramsCount = 10000000);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * char * char * string * int -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, termSeparator As Char, freqSeparator As Char, Optional inputColumnName As String = Nothing, Optional maximumNgramsCount As Integer = 10000000) As WordBagEstimator
参数
- catalog
- TransformsCatalog.TextTransforms
转换的目录。
- termSeparator
- Char
- freqSeparator
- Char
- inputColumnName
- String
要从中获取数据的列的名称。 要存储在字典中的 n 克的最大数目。用于分隔术语/频率对的分隔符。用于将术语与其频率分开的分隔符。 此估算器对文本矢量进行操作。
- maximumNgramsCount
- Int32
返回
注解
WordBagEstimator 与 不同的 NgramExtractingEstimator 是,前者在内部标记化文本,后者将标记化文本作为输入。
适用于
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)
创建一个 WordBagEstimator,它将中指定的 inputColumnName
列映射到名为 outputColumnName
的新列中的 n 个克计数的向量。
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator
参数
- catalog
- TransformsCatalog.TextTransforms
转换的目录。
- inputColumnName
- String
要从中获取数据的列的名称。 此估算器对文本矢量进行操作。
- ngramLength
- Int32
Ngram 长度。
- skipLength
- Int32
构造 n-gram 时要跳过的最大标记数。
- useAllLengths
- Boolean
是包含所有 n-gram 长度,最大 ngramLength
还是仅包含 ngramLength
。
- maximumNgramsCount
- Int32
要存储在字典中的 n 克的最大数目。
用于评估单词对语料库中文档的重要性的统计度量。
返回
注解
WordBagEstimator 与 不同的 NgramExtractingEstimator 是,前者在内部标记化文本,后者将标记化文本作为输入。
适用于
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)
创建一个 WordBagEstimator,用于将 中指定的 inputColumnNames
多个列映射到名为 outputColumnName
的新列中的 n 个语法计数的向量。
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, inputColumnNames As String(), Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator
参数
- catalog
- TransformsCatalog.TextTransforms
转换的目录。
- inputColumnNames
- String[]
要从中获取数据的多个列的名称。 此估算器对文本矢量进行操作。
- ngramLength
- Int32
Ngram 长度。
- skipLength
- Int32
构造 n-gram 时要跳过的最大标记数。
- useAllLengths
- Boolean
是包含所有 n-gram 长度,最大 ngramLength
还是仅包含 ngramLength
。
- maximumNgramsCount
- Int32
要存储在字典中的 n 克的最大数目。
用于评估单词对语料库中文档的重要性的统计度量。
返回
注解
WordBagEstimator 与 不同的 NgramExtractingEstimator 是,前者在内部标记化文本,后者将标记化文本作为输入。