你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
LexicalTokenizerName Class
- java.
lang. Object - com.
azure. core. util. ExpandableStringEnum<T> - com.
azure. search. documents. indexes. models. LexicalTokenizerName
- com.
- com.
public final class LexicalTokenizerName
extends ExpandableStringEnum<LexicalTokenizerName>
Defines the names of all tokenizers supported by the search engine.
Field Summary
Modifier and Type | Field and Description |
---|---|
static final
Lexical |
CLASSIC
Grammar-based tokenizer that is suitable for processing most European-language documents. |
static final
Lexical |
EDGE_NGRAM
Tokenizes the input from an edge into n-grams of the given size(s). |
static final
Lexical |
KEYWORD
Emits the entire input as a single token. |
static final
Lexical |
LETTER
Divides text at non-letters. |
static final
Lexical |
LOWERCASE
Divides text at non-letters and converts them to lower case. |
static final
Lexical |
MICROSOFT_LANGUAGE_STEMMING_TOKENIZER
Divides text using language-specific rules and reduces words to their base forms. |
static final
Lexical |
MICROSOFT_LANGUAGE_TOKENIZER
Divides text using language-specific rules. |
static final
Lexical |
NGRAM
Tokenizes the input into n-grams of the given size(s). |
static final
Lexical |
PATH_HIERARCHY
Tokenizer for path-like hierarchies. |
static final
Lexical |
PATTERN
Tokenizer that uses regex pattern matching to construct distinct tokens. |
static final
Lexical |
STANDARD
Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. |
static final
Lexical |
UAX_URL_EMAIL
Tokenizes urls and emails as one token. |
static final
Lexical |
WHITESPACE
Divides text at whitespace. |
Constructor Summary
Constructor | Description |
---|---|
LexicalTokenizerName() |
Deprecated
Use the fromString(String name) factory method.
Creates a new instance of Lexical |
Method Summary
Modifier and Type | Method and Description |
---|---|
static
Lexical |
fromString(String name)
Creates or finds a Lexical |
static
Collection<Lexical |
values()
Gets known Lexical |
Methods inherited from ExpandableStringEnum
Methods inherited from java.lang.Object
Field Details
CLASSIC
public static final LexicalTokenizerName CLASSIC
Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html.
EDGE_NGRAM
public static final LexicalTokenizerName EDGE_NGRAM
Tokenizes the input from an edge into n-grams of the given size(s). See https://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html.
KEYWORD
public static final LexicalTokenizerName KEYWORD
Emits the entire input as a single token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/KeywordTokenizer.html.
LETTER
public static final LexicalTokenizerName LETTER
Divides text at non-letters. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html.
LOWERCASE
public static final LexicalTokenizerName LOWERCASE
Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseTokenizer.html.
MICROSOFT_LANGUAGE_STEMMING_TOKENIZER
public static final LexicalTokenizerName MICROSOFT_LANGUAGE_STEMMING_TOKENIZER
Divides text using language-specific rules and reduces words to their base forms.
MICROSOFT_LANGUAGE_TOKENIZER
public static final LexicalTokenizerName MICROSOFT_LANGUAGE_TOKENIZER
Divides text using language-specific rules.
NGRAM
public static final LexicalTokenizerName NGRAM
Tokenizes the input into n-grams of the given size(s). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html.
PATH_HIERARCHY
public static final LexicalTokenizerName PATH_HIERARCHY
Tokenizer for path-like hierarchies. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html.
PATTERN
public static final LexicalTokenizerName PATTERN
Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html.
STANDARD
public static final LexicalTokenizerName STANDARD
Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html.
UAX_URL_EMAIL
public static final LexicalTokenizerName UAX_URL_EMAIL
Tokenizes urls and emails as one token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.html.
WHITESPACE
public static final LexicalTokenizerName WHITESPACE
Divides text at whitespace. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizer.html.
Constructor Details
LexicalTokenizerName
@Deprecated
public LexicalTokenizerName()
Deprecated
Creates a new instance of LexicalTokenizerName value.
Method Details
fromString
public static LexicalTokenizerName fromString(String name)
Creates or finds a LexicalTokenizerName from its string representation.
Parameters:
Returns:
values
public static Collection
Gets known LexicalTokenizerName values.
Returns: