Bpe Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Represent the Byte Pair Encoding model.
public sealed class Bpe : Microsoft.ML.Tokenizers.Model
type Bpe = class
inherit Model
Public NotInheritable Class Bpe
Inherits Model
- Inheritance
Constructors
Bpe() |
Construct a new Bpe model object with no tokenization vocabulary. This constructor is useful only in the training scenario. |
Bpe(String, String, String, String, String) |
Construct a new Bpe model object to use for sentence tokenization and tokenizer training. |
Properties
ContinuingSubwordPrefix |
An optional prefix to use on any sub-word that exist only behind another one |
Decoder |
Gets the Bpe decoder object. |
EndOfWordSuffix |
An optional suffix to characterize and end-of-word sub-word |
FuseUnknownTokens |
Gets or sets whether allowing multiple unknown tokens get fused |
UnknownToken |
Gets or Sets unknown token. The unknown token to be used when we encounter an unknown char |
Methods
GetTrainer() |
Gets a trainer object to use in training the model and generate the vocabulary and merges data. |
GetVocab() |
Gets the dictionary mapping tokens to Ids. |
GetVocabSize() |
Gets the dictionary size that map tokens to Ids. |
IdToString(Int32, Boolean) |
Map the tokenized Id to the token. |
IdToToken(Int32, Boolean) |
Map the tokenized Id to the token. |
IsValidChar(Char) | |
Save(String, String) |
Save the model data into the vocabulary and merges files. |
Tokenize(String) |
Tokenize a sequence string to a list of tokens. |
TokenToId(String) |
Map the token to tokenized Id. |