BpeTrainer Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
The Bpe trainer responsible to train the Bpe model.
public sealed class BpeTrainer : Microsoft.ML.Tokenizers.Trainer
type BpeTrainer = class
inherit Trainer
Public NotInheritable Class BpeTrainer
Inherits Trainer
- Inheritance
Constructors
BpeTrainer() |
Construct a new BpeTrainer object using the default values. |
BpeTrainer(IEnumerable<AddedToken>, Int32, Int32, ReportProgress, Nullable<Int32>, HashSet<Char>, String, String) |
Construct a new BpeTrainer object. |
Properties
ContinuingSubwordPrefix |
Gets the prefix to be used for every sub-word that is not a beginning-of-word. |
EndOfWordSuffix |
Gets the suffix to be used for every sub-word that is a end-of-word. |
InitialAlphabet |
Gets the list of characters to include in the initial alphabet, even if not seen in the training dataset. If the strings contain more than one character, only the first one is kept. |
LimitAlphabet |
Gets the maximum different characters to keep in the alphabet. |
MinFrequency |
Gets the minimum frequency a pair should have in order to be merged. |
Progress |
Set when need to report the progress during the training. (Inherited from Trainer) |
SpecialTokens |
Gets the list of special tokens the model should know of. |
VocabSize |
Gets the size of the final vocabulary, including all tokens and alphabet. |
Methods
Feed(IEnumerable<String>, Func<String,IEnumerable<String>>) |
Process the input sequences and feed the result to the model. |
Train(Model) |
Perform the actual training and update the input model with the new vocabularies and merges data. |