ConversionsExtensionsCatalog.Hash メソッド
定義
重要
一部の情報は、リリース前に大きく変更される可能性があるプレリリースされた製品に関するものです。 Microsoft は、ここに記載されている情報について、明示または黙示を問わず、一切保証しません。
オーバーロード
Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[]) |
入力列の HashingEstimatorデータ型 InputColumnName を新しい列にハッシュする 、 を作成します Name。 |
Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32) |
HashingEstimator指定された |
Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])
入力列の HashingEstimatorデータ型 InputColumnName を新しい列にハッシュする 、 を作成します Name。
public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, params Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] columns);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, ParamArray columns As HashingEstimator.ColumnOptions()) As HashingEstimator
パラメーター
変換のカタログ。
- columns
- HashingEstimator.ColumnOptions[]
入力列名と出力列名も含むエスティメーターの詳細オプション。 このエスティメーターは、テキスト、数値、ブール値、キー、およびデータ型に対して DataViewRowId 動作します。 新しい列のデータ型は、入力列の UInt32データ型がベクターか UInt32 スカラーかに基づくベクトルになります。
戻り値
例
using System;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace Samples.Dynamic
{
// This example demonstrates hashing of categorical string and integer data types by using Hash transform's
// advanced options API.
public static class HashWithOptions
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext(seed: 1);
// Get a small dataset as an IEnumerable.
var rawData = new[] {
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "NFL" , Age = 14 },
new DataPoint() { Category = "NFL" , Age = 15 },
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "MLS" , Age = 14 },
};
var data = mlContext.Data.LoadFromEnumerable(rawData);
// Construct the pipeline that would hash the two columns and store the
// results in new columns. The first transform hashes the string column
// and the second transform hashes the integer column.
//
// Hashing is not a reversible operation, so there is no way to retrieve
// the original value from the hashed value. Sometimes, for debugging,
// or model explainability, users will need to know what values in the
// original columns generated the values in the hashed columns, since
// the algorithms will mostly use the hashed values for further
// computations. The Hash method will preserve the mapping from the
// original values to the hashed values in the Annotations of the newly
// created column (column populated with the hashed values).
//
// Setting the maximumNumberOfInverts parameters to -1 will preserve the
// full map. If that parameter is left to the default 0 value, the
// mapping is not preserved.
var pipeline = mlContext.Transforms.Conversion.Hash(
new[]
{
new HashingEstimator.ColumnOptions(
"CategoryHashed",
"Category",
16,
useOrderedHashing: false,
maximumNumberOfInverts: -1),
new HashingEstimator.ColumnOptions(
"AgeHashed",
"Age",
8,
useOrderedHashing: false)
});
// Let's fit our pipeline, and then apply it to the same data.
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
// Convert the post transformation from the IDataView format to an
// IEnumerable <TransformedData> for easy consumption.
var convertedData = mlContext.Data.CreateEnumerable<
TransformedDataPoint>(transformedData, true);
Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
foreach (var item in convertedData)
Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t " +
$"{item.Age}\t {item.AgeHashed}");
// Expected data after the transformation.
//
// Category CategoryHashed Age AgeHashed
// MLB 36206 18 127
// NFL 19015 14 62
// NFL 19015 15 43
// MLB 36206 18 127
// MLS 6013 14 62
// For the Category column, where we set the maximumNumberOfInverts
// parameter, the names of the original categories, and their
// correspondence with the generated hash values is preserved in the
// Annotations in the format of indices and values.the indices array
// will have the hashed values, and the corresponding element,
// position -wise, in the values array will contain the original value.
//
// See below for an example on how to retrieve the mapping.
var slotNames = new VBuffer<ReadOnlyMemory<char>>();
transformedData.Schema["CategoryHashed"].Annotations.GetValue(
"KeyValues", ref slotNames);
var indices = slotNames.GetIndices();
var categoryNames = slotNames.GetValues();
for (int i = 0; i < indices.Length; i++)
Console.WriteLine($"The original value of the {indices[i]} " +
$"category is {categoryNames[i]}");
// Output Data
//
// The original value of the 6012 category is MLS
// The original value of the 19014 category is NFL
// The original value of the 36205 category is MLB
}
public class DataPoint
{
public string Category { get; set; }
public uint Age { get; set; }
}
public class TransformedDataPoint : DataPoint
{
public uint CategoryHashed { get; set; }
public uint AgeHashed { get; set; }
}
}
}
注釈
この変換は、複数の列に対して動作できます。
適用対象
Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)
HashingEstimator指定されたinputColumnName
列から新しい列にデータをハッシュする 、次の値を作成しますoutputColumnName
。
public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 31, int maximumNumberOfInverts = 0);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * string * string * int * int -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 31, Optional maximumNumberOfInverts As Integer = 0) As HashingEstimator
パラメーター
変換変換のカタログ。
- outputColumnName
- String
の変換によって生成される列の inputColumnName
名前。
この列のデータ型は、キーのベクトル、または入力列のデータ型がベクトルかスカラーかに基づくキーのスカラーになります。
- inputColumnName
- String
データがハッシュされる列の名前。
に null
設定すると、その値が outputColumnName
ソースとして使用されます。
この推定関数は、テキスト、数値、ブール値、キー、またはデータ型のベクトルまたは DataViewRowId スカラーに対して動作します。
- numberOfBits
- Int32
ハッシュ後のビット数。 1 ~ 31 の範囲である必要があります。値は 1 から 31 です。
- maximumNumberOfInverts
- Int32
ハッシュ中に、元の値と生成されたハッシュ値の間のマッピングを構築します。
元の値のテキスト表現は、新しい列の注釈のスロット名に格納されます。そのため、ハッシュは多くの初期値を 1 つにマップできます。
maximumNumberOfInverts
保持する必要があるハッシュに対応する個別の入力値の数の上限を指定します。
0 は入力値を保持しません。 -1 は 、各ハッシュにマッピングされたすべての入力値を保持します。
戻り値
例
using System;
using Microsoft.ML;
using Microsoft.ML.Data;
namespace Samples.Dynamic
{
// This example demonstrates hashing of categorical string and integer data types.
public static class Hash
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext(seed: 1);
// Get a small dataset as an IEnumerable.
var rawData = new[] {
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "NFL" , Age = 14 },
new DataPoint() { Category = "NFL" , Age = 15 },
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "MLS" , Age = 14 },
};
var data = mlContext.Data.LoadFromEnumerable(rawData);
// Construct the pipeline that would hash the two columns and store the
// results in new columns. The first transform hashes the string column
// and the second transform hashes the integer column.
//
// Hashing is not a reversible operation, so there is no way to retrieve
// the original value from the hashed value. Sometimes, for debugging,
// or model explainability, users will need to know what values in the
// original columns generated the values in the hashed columns, since
// the algorithms will mostly use the hashed values for further
// computations. The Hash method will preserve the mapping from the
// original values to the hashed values in the Annotations of the newly
// created column (column populated with the hashed values).
//
// Setting the maximumNumberOfInverts parameters to -1 will preserve the
// full map. If that parameter is left to the default 0 value, the
// mapping is not preserved.
var pipeline = mlContext.Transforms.Conversion.Hash("CategoryHashed",
"Category", numberOfBits: 16, maximumNumberOfInverts: -1)
.Append(mlContext.Transforms.Conversion.Hash("AgeHashed", "Age",
numberOfBits: 8));
// Let's fit our pipeline, and then apply it to the same data.
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
// Convert the post transformation from the IDataView format to an
// IEnumerable <TransformedData> for easy consumption.
var convertedData = mlContext.Data.CreateEnumerable<
TransformedDataPoint>(transformedData, true);
Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
foreach (var item in convertedData)
Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t " +
$"{item.Age}\t {item.AgeHashed}");
// Expected data after the transformation.
//
// Category CategoryHashed Age AgeHashed
// MLB 36206 18 127
// NFL 19015 14 62
// NFL 19015 15 43
// MLB 36206 18 127
// MLS 6013 14 62
// For the Category column, where we set the maximumNumberOfInverts
// parameter, the names of the original categories, and their
// correspondence with the generated hash values is preserved in the
// Annotations in the format of indices and values.the indices array
// will have the hashed values, and the corresponding element,
// position -wise, in the values array will contain the original value.
//
// See below for an example on how to retrieve the mapping.
var slotNames = new VBuffer<ReadOnlyMemory<char>>();
transformedData.Schema["CategoryHashed"].Annotations.GetValue(
"KeyValues", ref slotNames);
var indices = slotNames.GetIndices();
var categoryNames = slotNames.GetValues();
for (int i = 0; i < indices.Length; i++)
Console.WriteLine($"The original value of the {indices[i]} " +
$"category is {categoryNames[i]}");
// Output Data
//
// The original value of the 6012 category is MLS
// The original value of the 19014 category is NFL
// The original value of the 36205 category is MLB
}
public class DataPoint
{
public string Category { get; set; }
public uint Age { get; set; }
}
public class TransformedDataPoint : DataPoint
{
public uint CategoryHashed { get; set; }
public uint AgeHashed { get; set; }
}
}
}