BpeTrainer Constructors
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
BpeTrainer() |
Construct a new BpeTrainer object using the default values. |
BpeTrainer(IEnumerable<AddedToken>, Int32, Int32, ReportProgress, Nullable<Int32>, HashSet<Char>, String, String) |
Construct a new BpeTrainer object. |
BpeTrainer()
Construct a new BpeTrainer object using the default values.
public BpeTrainer ();
Public Sub New ()
Applies to
BpeTrainer(IEnumerable<AddedToken>, Int32, Int32, ReportProgress, Nullable<Int32>, HashSet<Char>, String, String)
Construct a new BpeTrainer object.
public BpeTrainer (System.Collections.Generic.IEnumerable<Microsoft.ML.Tokenizers.AddedToken>? specialTokens, int minFrequency = 0, int vocabSize = 30000, Microsoft.ML.Tokenizers.ReportProgress? progress = default, int? limitAlphabet = default, System.Collections.Generic.HashSet<char>? initialAlphabet = default, string? continuingSubwordPrefix = default, string? endOfWordSuffix = default);
new Microsoft.ML.Tokenizers.BpeTrainer : seq<Microsoft.ML.Tokenizers.AddedToken> * int * int * Microsoft.ML.Tokenizers.ReportProgress * Nullable<int> * System.Collections.Generic.HashSet<char> * string * string -> Microsoft.ML.Tokenizers.BpeTrainer
Public Sub New (specialTokens As IEnumerable(Of AddedToken), Optional minFrequency As Integer = 0, Optional vocabSize As Integer = 30000, Optional progress As ReportProgress = Nothing, Optional limitAlphabet As Nullable(Of Integer) = Nothing, Optional initialAlphabet As HashSet(Of Char) = Nothing, Optional continuingSubwordPrefix As String = Nothing, Optional endOfWordSuffix As String = Nothing)
Parameters
- specialTokens
- IEnumerable<AddedToken>
The list of special tokens the model should know of.
- minFrequency
- Int32
The minimum frequency a pair should have in order to be merged.
- vocabSize
- Int32
the size of the final vocabulary, including all tokens and alphabet.
- progress
- ReportProgress
Callback for the training progress updates.
The JSON file path containing the dictionary of string keys and their ids
- continuingSubwordPrefix
- String
the prefix to be used for every sub-word that is not a beginning-of-word.
- endOfWordSuffix
- String
the suffix to be used for every sub-word that is a end-of-word.