ConversionsExtensionsCatalog.Hash 方法
定義
重要
部分資訊涉及發行前產品,在發行之前可能會有大幅修改。 Microsoft 對此處提供的資訊,不做任何明確或隱含的瑕疵擔保。
多載
Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[]) |
建立 HashingEstimator ,它會將輸入資料行的資料類型 InputColumnName 雜湊至新的資料行: Name 。 |
Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32) |
建立 HashingEstimator ,它會將資料從 中指定的 |
Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])
建立 HashingEstimator ,它會將輸入資料行的資料類型 InputColumnName 雜湊至新的資料行: Name 。
public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, params Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] columns);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, ParamArray columns As HashingEstimator.ColumnOptions()) As HashingEstimator
參數
轉換的目錄。
- columns
- HashingEstimator.ColumnOptions[]
包含輸入和輸出資料行名稱之估算器的進階選項。 此估算器會透過文字、數值、布林值、索引鍵和 DataViewRowId 資料類型運作。 新資料行的資料類型將是 的 UInt32 向量,或 UInt32 根據輸入資料行資料類型是向量或純量。
傳回
範例
using System;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace Samples.Dynamic
{
// This example demonstrates hashing of categorical string and integer data types by using Hash transform's
// advanced options API.
public static class HashWithOptions
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext(seed: 1);
// Get a small dataset as an IEnumerable.
var rawData = new[] {
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "NFL" , Age = 14 },
new DataPoint() { Category = "NFL" , Age = 15 },
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "MLS" , Age = 14 },
};
var data = mlContext.Data.LoadFromEnumerable(rawData);
// Construct the pipeline that would hash the two columns and store the
// results in new columns. The first transform hashes the string column
// and the second transform hashes the integer column.
//
// Hashing is not a reversible operation, so there is no way to retrieve
// the original value from the hashed value. Sometimes, for debugging,
// or model explainability, users will need to know what values in the
// original columns generated the values in the hashed columns, since
// the algorithms will mostly use the hashed values for further
// computations. The Hash method will preserve the mapping from the
// original values to the hashed values in the Annotations of the newly
// created column (column populated with the hashed values).
//
// Setting the maximumNumberOfInverts parameters to -1 will preserve the
// full map. If that parameter is left to the default 0 value, the
// mapping is not preserved.
var pipeline = mlContext.Transforms.Conversion.Hash(
new[]
{
new HashingEstimator.ColumnOptions(
"CategoryHashed",
"Category",
16,
useOrderedHashing: false,
maximumNumberOfInverts: -1),
new HashingEstimator.ColumnOptions(
"AgeHashed",
"Age",
8,
useOrderedHashing: false)
});
// Let's fit our pipeline, and then apply it to the same data.
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
// Convert the post transformation from the IDataView format to an
// IEnumerable <TransformedData> for easy consumption.
var convertedData = mlContext.Data.CreateEnumerable<
TransformedDataPoint>(transformedData, true);
Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
foreach (var item in convertedData)
Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t " +
$"{item.Age}\t {item.AgeHashed}");
// Expected data after the transformation.
//
// Category CategoryHashed Age AgeHashed
// MLB 36206 18 127
// NFL 19015 14 62
// NFL 19015 15 43
// MLB 36206 18 127
// MLS 6013 14 62
// For the Category column, where we set the maximumNumberOfInverts
// parameter, the names of the original categories, and their
// correspondence with the generated hash values is preserved in the
// Annotations in the format of indices and values.the indices array
// will have the hashed values, and the corresponding element,
// position -wise, in the values array will contain the original value.
//
// See below for an example on how to retrieve the mapping.
var slotNames = new VBuffer<ReadOnlyMemory<char>>();
transformedData.Schema["CategoryHashed"].Annotations.GetValue(
"KeyValues", ref slotNames);
var indices = slotNames.GetIndices();
var categoryNames = slotNames.GetValues();
for (int i = 0; i < indices.Length; i++)
Console.WriteLine($"The original value of the {indices[i]} " +
$"category is {categoryNames[i]}");
// Output Data
//
// The original value of the 6012 category is MLS
// The original value of the 19014 category is NFL
// The original value of the 36205 category is MLB
}
public class DataPoint
{
public string Category { get; set; }
public uint Age { get; set; }
}
public class TransformedDataPoint : DataPoint
{
public uint CategoryHashed { get; set; }
public uint AgeHashed { get; set; }
}
}
}
備註
此轉換可以透過數個數據行運作。
適用於
Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)
建立 HashingEstimator ,它會將資料從 中指定的 inputColumnName
資料行雜湊至新的資料行: outputColumnName
。
public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 31, int maximumNumberOfInverts = 0);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * string * string * int * int -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 31, Optional maximumNumberOfInverts As Integer = 0) As HashingEstimator
參數
轉換的目錄。
- outputColumnName
- String
轉換所產生的 inputColumnName
資料行名稱。
此資料行的資料類型會是索引鍵的向量,或根據輸入資料行資料類型是向量或純量索引鍵的純量。
- inputColumnName
- String
要雜湊其資料的資料行名稱。
如果設定為 null
,則會將 的值 outputColumnName
當做來源使用。
此估算器會透過文字、數值、布林值、索引鍵或資料類型的向量或 DataViewRowId 純量運作。
- numberOfBits
- Int32
要雜湊到的位數。 必須介於 1 到 31 之間,包含。
- maximumNumberOfInverts
- Int32
在雜湊處理期間,我們會建構原始值與所產生雜湊值之間的對應。
原始值的文字表示會儲存在新資料行之批註的位置名稱中。因此,雜湊可以將許多初始值對應至一個。
maximumNumberOfInverts
指定對應至應保留之雜湊的相異輸入值數目上限。
0 不會保留任何輸入值。 -1 會保留與每個雜湊對應的所有輸入值。
傳回
範例
using System;
using Microsoft.ML;
using Microsoft.ML.Data;
namespace Samples.Dynamic
{
// This example demonstrates hashing of categorical string and integer data types.
public static class Hash
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext(seed: 1);
// Get a small dataset as an IEnumerable.
var rawData = new[] {
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "NFL" , Age = 14 },
new DataPoint() { Category = "NFL" , Age = 15 },
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "MLS" , Age = 14 },
};
var data = mlContext.Data.LoadFromEnumerable(rawData);
// Construct the pipeline that would hash the two columns and store the
// results in new columns. The first transform hashes the string column
// and the second transform hashes the integer column.
//
// Hashing is not a reversible operation, so there is no way to retrieve
// the original value from the hashed value. Sometimes, for debugging,
// or model explainability, users will need to know what values in the
// original columns generated the values in the hashed columns, since
// the algorithms will mostly use the hashed values for further
// computations. The Hash method will preserve the mapping from the
// original values to the hashed values in the Annotations of the newly
// created column (column populated with the hashed values).
//
// Setting the maximumNumberOfInverts parameters to -1 will preserve the
// full map. If that parameter is left to the default 0 value, the
// mapping is not preserved.
var pipeline = mlContext.Transforms.Conversion.Hash("CategoryHashed",
"Category", numberOfBits: 16, maximumNumberOfInverts: -1)
.Append(mlContext.Transforms.Conversion.Hash("AgeHashed", "Age",
numberOfBits: 8));
// Let's fit our pipeline, and then apply it to the same data.
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
// Convert the post transformation from the IDataView format to an
// IEnumerable <TransformedData> for easy consumption.
var convertedData = mlContext.Data.CreateEnumerable<
TransformedDataPoint>(transformedData, true);
Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
foreach (var item in convertedData)
Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t " +
$"{item.Age}\t {item.AgeHashed}");
// Expected data after the transformation.
//
// Category CategoryHashed Age AgeHashed
// MLB 36206 18 127
// NFL 19015 14 62
// NFL 19015 15 43
// MLB 36206 18 127
// MLS 6013 14 62
// For the Category column, where we set the maximumNumberOfInverts
// parameter, the names of the original categories, and their
// correspondence with the generated hash values is preserved in the
// Annotations in the format of indices and values.the indices array
// will have the hashed values, and the corresponding element,
// position -wise, in the values array will contain the original value.
//
// See below for an example on how to retrieve the mapping.
var slotNames = new VBuffer<ReadOnlyMemory<char>>();
transformedData.Schema["CategoryHashed"].Annotations.GetValue(
"KeyValues", ref slotNames);
var indices = slotNames.GetIndices();
var categoryNames = slotNames.GetValues();
for (int i = 0; i < indices.Length; i++)
Console.WriteLine($"The original value of the {indices[i]} " +
$"category is {categoryNames[i]}");
// Output Data
//
// The original value of the 6012 category is MLS
// The original value of the 19014 category is NFL
// The original value of the 36205 category is MLB
}
public class DataPoint
{
public string Category { get; set; }
public uint Age { get; set; }
}
public class TransformedDataPoint : DataPoint
{
public uint CategoryHashed { get; set; }
public uint AgeHashed { get; set; }
}
}
}