DataOperationsCatalog.LoadFromEnumerable 方法
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
重载
LoadFromEnumerable<TRow>(IEnumerable<TRow>, SchemaDefinition) |
在用户定义的类型的项的枚举上创建新 IDataView 项。
用户保留其 流式处理数据视图的典型用法可能是:创建数据视图,根据需要延迟加载数据,然后将预先训练的转换应用于数据,然后游标浏览数据以获取转换结果。 |
LoadFromEnumerable<TRow>(IEnumerable<TRow>, DataViewSchema) |
使用提供的DataViewSchema可枚举用户定义类型的项创建一个新IDataView项,这可能包含有关架构的详细信息,而不是类型可以捕获的信息。 |
LoadFromEnumerable<TRow>(IEnumerable<TRow>, SchemaDefinition)
public Microsoft.ML.IDataView LoadFromEnumerable<TRow> (System.Collections.Generic.IEnumerable<TRow> data, Microsoft.ML.Data.SchemaDefinition schemaDefinition = default) where TRow : class;
member this.LoadFromEnumerable : seq<'Row (requires 'Row : null)> * Microsoft.ML.Data.SchemaDefinition -> Microsoft.ML.IDataView (requires 'Row : null)
Public Function LoadFromEnumerable(Of TRow As Class) (data As IEnumerable(Of TRow), Optional schemaDefinition As SchemaDefinition = Nothing) As IDataView
类型参数
- TRow
用户定义的项类型。
参数
- data
- IEnumerable<TRow>
要转换为 IDataView. 的类型的TRow
可枚举数据。
- schemaDefinition
- SchemaDefinition
要创建的数据视图的可选架构定义。 如果 null
,则从中 TRow
推断架构定义。
返回
构造的 IDataView。
示例
using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Data;
namespace Samples.Dynamic
{
public static class LoadFromEnumerable
{
// Creating IDataView from IEnumerable, and setting the size of the vector
// at runtime. When the data model is defined through types, setting the
// size of the vector is done through the VectorType annotation. When the
// size of the data is not known at compile time, the Schema can be directly
// modified at runtime and the size of the vector set there. This is
// important, because most of the ML.NET trainers require the Features
// vector to be of known size.
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for
// exception tracking and logging, as a catalog of available operations
// and as the source of randomness.
var mlContext = new MLContext();
// Get a small dataset as an IEnumerable.
IEnumerable<DataPointVector> enumerableKnownSize = new DataPointVector[]
{
new DataPointVector{ Features = new float[]{ 1.2f, 3.4f, 4.5f, 3.2f,
7,5f } },
new DataPointVector{ Features = new float[]{ 4.2f, 3.4f, 14.65f,
3.2f, 3,5f } },
new DataPointVector{ Features = new float[]{ 1.6f, 3.5f, 4.5f, 6.2f,
3,5f } },
};
// Load dataset into an IDataView.
IDataView data = mlContext.Data.LoadFromEnumerable(enumerableKnownSize);
var featureColumn = data.Schema["Features"].Type as VectorDataViewType;
// Inspecting the schema
Console.WriteLine($"Is the size of the Features column known: " +
$"{featureColumn.IsKnownSize}.\nSize: {featureColumn.Size}");
// Preview
//
// Is the size of the Features column known? True.
// Size: 5.
// If the size of the vector is unknown at compile time, it can be set
// at runtime.
IEnumerable<DataPoint> enumerableUnknownSize = new DataPoint[]
{
new DataPoint{ Features = new float[]{ 1.2f, 3.4f, 4.5f } },
new DataPoint{ Features = new float[]{ 4.2f, 3.4f, 1.6f } },
new DataPoint{ Features = new float[]{ 1.6f, 3.5f, 4.5f } },
};
// The feature dimension (typically this will be the Count of the array
// of the features vector known at runtime).
int featureDimension = 3;
var definedSchema = SchemaDefinition.Create(typeof(DataPoint));
featureColumn = definedSchema["Features"]
.ColumnType as VectorDataViewType;
Console.WriteLine($"Is the size of the Features column known: " +
$"{featureColumn.IsKnownSize}.\nSize: {featureColumn.Size}");
// Preview
//
// Is the size of the Features column known? False.
// Size: 0.
// Set the column type to be a known-size vector.
var vectorItemType = ((VectorDataViewType)definedSchema[0].ColumnType)
.ItemType;
definedSchema[0].ColumnType = new VectorDataViewType(vectorItemType,
featureDimension);
// Read the data into an IDataView with the modified schema supplied in
IDataView data2 = mlContext.Data
.LoadFromEnumerable(enumerableUnknownSize, definedSchema);
featureColumn = data2.Schema["Features"].Type as VectorDataViewType;
// Inspecting the schema
Console.WriteLine($"Is the size of the Features column known: " +
$"{featureColumn.IsKnownSize}.\nSize: {featureColumn.Size}");
// Preview
//
// Is the size of the Features column known? True.
// Size: 3.
}
}
public class DataPoint
{
public float[] Features { get; set; }
}
public class DataPointVector
{
[VectorType(5)]
public float[] Features { get; set; }
}
}
适用于
LoadFromEnumerable<TRow>(IEnumerable<TRow>, DataViewSchema)
使用提供的DataViewSchema可枚举用户定义类型的项创建一个新IDataView项,这可能包含有关架构的详细信息,而不是类型可以捕获的信息。
public Microsoft.ML.IDataView LoadFromEnumerable<TRow> (System.Collections.Generic.IEnumerable<TRow> data, Microsoft.ML.DataViewSchema schema) where TRow : class;
member this.LoadFromEnumerable : seq<'Row (requires 'Row : null)> * Microsoft.ML.DataViewSchema -> Microsoft.ML.IDataView (requires 'Row : null)
Public Function LoadFromEnumerable(Of TRow As Class) (data As IEnumerable(Of TRow), schema As DataViewSchema) As IDataView
类型参数
- TRow
用户定义的项类型。
参数
- data
- IEnumerable<TRow>
要转换为 IDataView. 的类型的TRow
可枚举数据。
- schema
- DataViewSchema
返回 IDataView的架构。
返回
具有给定schema
的 。IDataView
注解
用户保留其 data
所有权,生成的数据视图永远不会更改其 data
内容。 由于假定为不可变,因此 IDataView 用户应支持返回相同结果的 data
多个枚举,除非用户知道数据只会游标一次。 流式处理数据视图的典型用法可能是:创建数据视图,根据需要延迟加载数据,然后将预先训练的转换应用于数据,然后游标浏览数据以获取转换结果。 这样做的一个实际用法是通过功能列名称 DataViewSchema.Annotations提供。