PreTokenizer Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Base class for all pre-tokenizers classes. The PreTokenizer is in charge of doing the pre-segmentation step.
public abstract class PreTokenizer
type PreTokenizer = class
Public MustInherit Class PreTokenizer
- Inheritance
-
PreTokenizer
- Derived
Constructors
PreTokenizer() |
Methods
CreateWhiteSpace(IReadOnlyDictionary<String,Int32>) |
Create a new instance of the PreTokenizer class which split the text at the white spaces. |
CreateWordOrNonWord(IReadOnlyDictionary<String,Int32>) |
Create a new instance of the PreTokenizer class which split the text at the word or non-word boundary. The word is a set of alphabet, numeric, and underscore characters. |
CreateWordOrPunctuation(IReadOnlyDictionary<String,Int32>) |
Create a new instance of the PreTokenizer class which split the text at the whitespace or punctuation characters. |
PreTokenize(ReadOnlySpan<Char>) |
Get the offsets and lengths of the tokens relative to the original string. |
PreTokenize(String) |
Get the offsets and lengths of the tokens relative to the |