TextLoaderSaverCatalog.CreateTextLoader 方法

参考

定义

命名空间:: Microsoft.ML

程序集:: Microsoft.ML.Data.dll

包:: Microsoft.ML v3.0.1

包:: Microsoft.ML v1.0.0

包:: Microsoft.ML v1.1.0

包:: Microsoft.ML v1.2.0

包:: Microsoft.ML v1.3.1

包:: Microsoft.ML v1.4.0

包:: Microsoft.ML v1.5.5

包:: Microsoft.ML v1.6.0

包:: Microsoft.ML v1.7.0

包:: Microsoft.ML v2.0.0

重要

一些信息与预发行产品相关，相应产品在发行之前可能会进行重大修改。对于此处提供的信息，Microsoft 不作任何明示或暗示的担保。

重载

CreateTextLoader(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)	创建文本加载程序 TextLoader。
CreateTextLoader(DataOperationsCatalog, TextLoader+Column[], Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)	创建文本加载程序 TextLoader。
CreateTextLoader<TInput>(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)	通过从数据模型类型推断数据集架构来创建文本加载程序 TextLoader 。
CreateTextLoader<TInput>(DataOperationsCatalog, Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)	通过从数据模型类型推断数据集架构来创建文本加载程序 TextLoader 。

CreateTextLoader(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)

创建文本加载程序 TextLoader。

public static Microsoft.ML.Data.TextLoader CreateTextLoader (this Microsoft.ML.DataOperationsCatalog catalog, Microsoft.ML.Data.TextLoader.Options options, Microsoft.ML.Data.IMultiStreamSource dataSample = default);

static member CreateTextLoader : Microsoft.ML.DataOperationsCatalog * Microsoft.ML.Data.TextLoader.Options * Microsoft.ML.Data.IMultiStreamSource -> Microsoft.ML.Data.TextLoader

<Extension()>
Public Function CreateTextLoader (catalog As DataOperationsCatalog, options As TextLoader.Options, Optional dataSample As IMultiStreamSource = Nothing) As TextLoader

参数

catalog: DataOperationsCatalog

目录 DataOperationsCatalog 。

options: TextLoader.Options

定义加载操作的设置。

dataSample: IMultiStreamSource

数据示例的可选位置。此示例可用于推断槽名称注释（如果存在），以及使用TextLoader.Rangenull最大索引定义的槽Columns数。如果示例已随 ML.NET 一 SaveAsText(DataOperationsCatalog, IDataView, Stream, Char, Boolean, Boolean, Boolean, Boolean)起保存，则它还会在标头中包含架构信息，即使 Columns 未指定，加载程序也可以读取这些架构信息。若要使用文件中定义的架构，所有其他 TextLoader.Options sould 都将保留其默认值。

TextLoader

适用于

CreateTextLoader(DataOperationsCatalog, TextLoader+Column[], Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)

创建文本加载程序 TextLoader。

public static Microsoft.ML.Data.TextLoader CreateTextLoader (this Microsoft.ML.DataOperationsCatalog catalog, Microsoft.ML.Data.TextLoader.Column[] columns, char separatorChar = '\t', bool hasHeader = false, Microsoft.ML.Data.IMultiStreamSource dataSample = default, bool allowQuoting = false, bool trimWhitespace = false, bool allowSparse = false);

static member CreateTextLoader : Microsoft.ML.DataOperationsCatalog * Microsoft.ML.Data.TextLoader.Column[] * char * bool * Microsoft.ML.Data.IMultiStreamSource * bool * bool * bool -> Microsoft.ML.Data.TextLoader

<Extension()>
Public Function CreateTextLoader (catalog As DataOperationsCatalog, columns As TextLoader.Column(), Optional separatorChar As Char = '\t', Optional hasHeader As Boolean = false, Optional dataSample As IMultiStreamSource = Nothing, Optional allowQuoting As Boolean = false, Optional trimWhitespace As Boolean = false, Optional allowSparse As Boolean = false) As TextLoader

参数

catalog: DataOperationsCatalog

目录 DataOperationsCatalog 。

columns: TextLoader.Column[]

定义架构的列 TextLoader.Column 数组。

separatorChar: Char

用作行中数据点之间的分隔符的字符。默认情况下，制表符用作分隔符。

hasHeader: Boolean

文件是否具有具有功能名称的标头。提供时，true指示中的第一行将用于特征名称，并且调用时Load(IMultiStreamSource)，将跳过第一行。如果未提供， true 则仅指示调用时 Load(IMultiStreamSource) 加载程序应跳过第一行，但列将没有槽名称注释。这是因为输出架构是在创建加载程序时创建的，而不是在调用时 Load(IMultiStreamSource) 创建的。

dataSample: IMultiStreamSource

数据示例的可选位置。此示例可用于推断槽名称注释（如果存在），以及用TextLoader.Rangenull最大索引定义的列中的槽数。如果示例已随 ML.NET 一 SaveAsText(DataOperationsCatalog, IDataView, Stream, Char, Boolean, Boolean, Boolean, Boolean)起保存，则它还会在标头中包含架构信息，即使 columns 为 null，加载程序也可以读取这些架构信息。为了使用文件中定义的架构，所有其他参数都保留其默认值。

allowQuoting: Boolean

输入是否可能包含双引号值。此参数用于区分输入值中的分隔符与实际分隔符。当为时 true，双引号中的分隔符被视为输入值的一部分。如果 false为，则所有分隔符（即使是引号中的分隔符）都被视为分隔新列。

trimWhitespace: Boolean

删除行中的尾随空格。

allowSparse: Boolean

输入是否可能包含稀疏表示形式。例如，包含“5 2：6 4：3”的行表示有 5 列，唯一的非零是列 2 和 4，它们分别具有值 6 和 3。列索引从零开始，因此第 2 列和第 4 列表示第 3 列和第 5 列。列可能还具有密集值，后跟以这种方式表示的稀疏值。例如，包含“1 2 5 2：6 4：3”的行表示两个值 1 和 2 的密集列，后跟值 0、0、6、0 和 3 的稀疏表示列。稀疏列的索引从 0 开始，即使 0 表示第三列。

TextLoader

示例

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic.DataOperations
{
    public static class LoadingText
    {
        // This examples shows all the ways to load data with TextLoader.
        public static void Example()
        {
            // Create 5 data files to illustrate different loading methods.
            var dataFiles = new List<string>();
            var random = new Random(1);
            var dataDirectoryName = "DataDir";
            Directory.CreateDirectory(dataDirectoryName);
            for (int i = 0; i < 5; i++)
            {
                var fileName = Path.Combine(dataDirectoryName, $"Data_{i}.csv");
                dataFiles.Add(fileName);
                using (var fs = File.CreateText(fileName))
                {
                    // Write without header with 10 random columns, forcing
                    // approximately 80% of values to be 0.
                    for (int line = 0; line < 10; line++)
                    {
                        var sb = new StringBuilder();
                        for (int pos = 0; pos < 10; pos++)
                        {
                            var value = random.NextDouble();
                            sb.Append((value < 0.8 ? 0 : value).ToString() + '\t');
                        }
                        fs.WriteLine(sb.ToString(0, sb.Length - 1));
                    }
                }
            }

            // Create a TextLoader.
            var mlContext = new MLContext();
            var loader = mlContext.Data.CreateTextLoader(
                columns: new[]
                {
                    new TextLoader.Column("Features", DataKind.Single, 0, 9)
                },
                hasHeader: false
            );

            // Load a single file from path.
            var singleFileData = loader.Load(dataFiles[0]);
            PrintRowCount(singleFileData);

            // Expected Output:
            //   10


            // Load all 5 files from path.
            var multipleFilesData = loader.Load(dataFiles.ToArray());
            PrintRowCount(multipleFilesData);

            // Expected Output:
            //   50


            // Load all files using path wildcard.
            var multipleFilesWildcardData =
                loader.Load(Path.Combine(dataDirectoryName, "Data_*.csv"));
            PrintRowCount(multipleFilesWildcardData);

            // Expected Output:
            //   50


            // Create a TextLoader with user defined type.
            var loaderWithCustomType =
                mlContext.Data.CreateTextLoader<Data>(hasHeader: false);

            // Load a single file from path.
            var singleFileCustomTypeData = loaderWithCustomType.Load(dataFiles[0]);
            PrintRowCount(singleFileCustomTypeData);

            // Expected Output:
            //   10


            // Create a TextLoader with unknown column length to illustrate
            // how a data sample may be used to infer column size.
            var dataSample = new MultiFileSource(dataFiles[0]);
            var loaderWithUnknownLength = mlContext.Data.CreateTextLoader(
                columns: new[]
                {
                    new TextLoader.Column("Features",
                                          DataKind.Single,
                                          new[] { new TextLoader.Range(0, null) })
                },
                dataSample: dataSample
            );

            var dataWithInferredLength = loaderWithUnknownLength.Load(dataFiles[0]);
            var featuresColumn = dataWithInferredLength.Schema.GetColumnOrNull("Features");
            if (featuresColumn.HasValue)
                Console.WriteLine(featuresColumn.Value.ToString());

            // Expected Output:
            //   Features: Vector<Single, 10>
            //
            // ML.NET infers the correct length of 10 for the Features column,
            // which is of type Vector<Single>.

            PrintRowCount(dataWithInferredLength);

            // Expected Output:
            //   10


            // Save the data with 10 rows to a text file to illustrate the use of
            // sparse format.
            var sparseDataFileName = Path.Combine(dataDirectoryName, "saved_data.tsv");
            using (FileStream stream = new FileStream(sparseDataFileName, FileMode.Create))
                mlContext.Data.SaveAsText(singleFileData, stream);

            // Since there are many zeroes in the data, it will be saved in a sparse
            // representation to save disk space. The data may be forced to be saved
            // in a dense representation by setting forceDense to true. The sparse
            // data will look like the following:
            //
            //   10 7:0.943862259
            //   10 3:0.989767134
            //   10 0:0.949778438   8:0.823028445   9:0.886469543
            //
            // The sparse representation of the first row indicates that there are
            // 10 columns, the column 7 (8-th column) has value 0.943862259, and other
            // omitted columns have value 0.

            // Create a TextLoader that allows sparse input.
            var sparseLoader = mlContext.Data.CreateTextLoader(
                columns: new[]
                {
                    new TextLoader.Column("Features", DataKind.Single, 0, 9)
                },
                allowSparse: true
            );

            // Load the saved sparse data.
            var sparseData = sparseLoader.Load(sparseDataFileName);
            PrintRowCount(sparseData);

            // Expected Output:
            //   10


            // Create a TextLoader without any column schema using TextLoader.Options.
            // Since the sparse data file was saved with ML.NET, it has the schema
            // enoded in its header that the loader can understand:
            //
            // #@ TextLoader{
            // #@   sep=tab
            // #@   col=Features:R4:0-9
            // #@ }
            //
            // The schema syntax is unimportant since it is only used internally. In
            // short, it tells the loader that the values are separated by tabs, and
            // that columns 0-9 in the text file are to be read into one column named
            // "Features" of type Single (internal type R4).

            var options = new TextLoader.Options()
            {
                AllowSparse = true,
            };
            var dataSampleWithSchema = new MultiFileSource(sparseDataFileName);
            var sparseLoaderWithSchema =
                mlContext.Data.CreateTextLoader(options, dataSample: dataSampleWithSchema);

            // Load the saved sparse data.
            var sparseDataWithSchema = sparseLoaderWithSchema.Load(sparseDataFileName);
            PrintRowCount(sparseDataWithSchema);

            // Expected Output:
            //   10
        }

        private static void PrintRowCount(IDataView idv)
        {
            // IDataView is lazy so we need to iterate through it
            // to get the number of rows.
            long rowCount = 0;
            using (var cursor = idv.GetRowCursor(idv.Schema))
                while (cursor.MoveNext())
                    rowCount++;

            Console.WriteLine(rowCount);
        }

        private class Data
        {
            [LoadColumn(0, 9)]
            public float[] Features { get; set; }
        }
    }
}

适用于

CreateTextLoader<TInput>(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)

通过从数据模型类型推断数据集架构来创建文本加载程序 TextLoader 。

public static Microsoft.ML.Data.TextLoader CreateTextLoader<TInput> (this Microsoft.ML.DataOperationsCatalog catalog, Microsoft.ML.Data.TextLoader.Options options, Microsoft.ML.Data.IMultiStreamSource dataSample = default);

static member CreateTextLoader : Microsoft.ML.DataOperationsCatalog * Microsoft.ML.Data.TextLoader.Options * Microsoft.ML.Data.IMultiStreamSource -> Microsoft.ML.Data.TextLoader

<Extension()>
Public Function CreateTextLoader(Of TInput) (catalog As DataOperationsCatalog, options As TextLoader.Options, Optional dataSample As IMultiStreamSource = Nothing) As TextLoader

类型参数

TInput

参数

catalog: DataOperationsCatalog

目录 DataOperationsCatalog 。

options: TextLoader.Options

定义加载操作的设置。定义加载操作的设置。无需指定 Columns 字段，因为此方法将推断列。

dataSample: IMultiStreamSource

数据示例的可选位置。此示例可用于推断有关列的信息，例如槽名称。

TextLoader

适用于

CreateTextLoader<TInput>(DataOperationsCatalog, Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)

通过从数据模型类型推断数据集架构来创建文本加载程序 TextLoader 。

public static Microsoft.ML.Data.TextLoader CreateTextLoader<TInput> (this Microsoft.ML.DataOperationsCatalog catalog, char separatorChar = '\t', bool hasHeader = false, Microsoft.ML.Data.IMultiStreamSource dataSample = default, bool allowQuoting = false, bool trimWhitespace = false, bool allowSparse = false);

static member CreateTextLoader : Microsoft.ML.DataOperationsCatalog * char * bool * Microsoft.ML.Data.IMultiStreamSource * bool * bool * bool -> Microsoft.ML.Data.TextLoader

<Extension()>
Public Function CreateTextLoader(Of TInput) (catalog As DataOperationsCatalog, Optional separatorChar As Char = '\t', Optional hasHeader As Boolean = false, Optional dataSample As IMultiStreamSource = Nothing, Optional allowQuoting As Boolean = false, Optional trimWhitespace As Boolean = false, Optional allowSparse As Boolean = false) As TextLoader

类型参数

TInput

定义要加载的数据的架构。使用用 LoadColumnAttribute (修饰的公共字段或属性，以及可能) 的其他属性在已加载数据的架构中指定列名及其数据类型。

参数

catalog: DataOperationsCatalog

目录 DataOperationsCatalog 。

separatorChar: Char

列分隔符。默认值为“\t”

hasHeader: Boolean

dataSample: IMultiStreamSource

数据示例的可选位置。此示例可用于推断槽名称注释（如果存在）。

allowQuoting: Boolean

输入是否可能包含双引号值。此参数用于区分输入值中的分隔符与实际分隔符。当为时 true，双引号中的分隔符被视为输入值的一部分。如果 false为，则所有分隔符（即使是那些惠廷引号）都被视为分隔新列。

trimWhitespace: Boolean

删除行中的尾随空格。

allowSparse: Boolean

TextLoader

适用于

通过

TextLoaderSaverCatalog.CreateTextLoader 方法

定义

重载

CreateTextLoader(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)

参数

返回

适用于

CreateTextLoader(DataOperationsCatalog, TextLoader+Column[], Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)

参数

返回

示例

适用于

CreateTextLoader<TInput>(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)

类型参数

参数

返回

适用于

CreateTextLoader<TInput>(DataOperationsCatalog, Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)

类型参数

参数

返回

适用于

其他资源