Dela via


BpeTrainer Constructors

Definition

Overloads

BpeTrainer()

Construct a new BpeTrainer object using the default values.

BpeTrainer(IEnumerable<AddedToken>, Int32, Int32, ReportProgress, Nullable<Int32>, HashSet<Char>, String, String)

Construct a new BpeTrainer object.

BpeTrainer()

Construct a new BpeTrainer object using the default values.

public BpeTrainer ();
Public Sub New ()

Applies to

BpeTrainer(IEnumerable<AddedToken>, Int32, Int32, ReportProgress, Nullable<Int32>, HashSet<Char>, String, String)

Construct a new BpeTrainer object.

public BpeTrainer (System.Collections.Generic.IEnumerable<Microsoft.ML.Tokenizers.AddedToken>? specialTokens, int minFrequency = 0, int vocabSize = 30000, Microsoft.ML.Tokenizers.ReportProgress? progress = default, int? limitAlphabet = default, System.Collections.Generic.HashSet<char>? initialAlphabet = default, string? continuingSubwordPrefix = default, string? endOfWordSuffix = default);
new Microsoft.ML.Tokenizers.BpeTrainer : seq<Microsoft.ML.Tokenizers.AddedToken> * int * int * Microsoft.ML.Tokenizers.ReportProgress * Nullable<int> * System.Collections.Generic.HashSet<char> * string * string -> Microsoft.ML.Tokenizers.BpeTrainer
Public Sub New (specialTokens As IEnumerable(Of AddedToken), Optional minFrequency As Integer = 0, Optional vocabSize As Integer = 30000, Optional progress As ReportProgress = Nothing, Optional limitAlphabet As Nullable(Of Integer) = Nothing, Optional initialAlphabet As HashSet(Of Char) = Nothing, Optional continuingSubwordPrefix As String = Nothing, Optional endOfWordSuffix As String = Nothing)

Parameters

specialTokens
IEnumerable<AddedToken>

The list of special tokens the model should know of.

minFrequency
Int32

The minimum frequency a pair should have in order to be merged.

vocabSize
Int32

the size of the final vocabulary, including all tokens and alphabet.

progress
ReportProgress

Callback for the training progress updates.

limitAlphabet
Nullable<Int32>

The list of characters to include in the initial alphabet.

initialAlphabet
HashSet<Char>

The JSON file path containing the dictionary of string keys and their ids

continuingSubwordPrefix
String

the prefix to be used for every sub-word that is not a beginning-of-word.

endOfWordSuffix
String

the suffix to be used for every sub-word that is a end-of-word.

Applies to