LexicalTokenizerName Class

Reference

Package:: com.azure.search.documents.indexes.models

Maven Artifact:: com.azure:azure-search-documents:11.7.4

java.lang.Object
- com.azure.core.util.ExpandableStringEnum<T>
- - com.azure.search.documents.indexes.models.LexicalTokenizerName

public final class LexicalTokenizerName
extends ExpandableStringEnum<LexicalTokenizerName>

Defines the names of all tokenizers supported by the search engine.

Field Summary

Modifier and Type	Field and Description
static final LexicalTokenizerName	CLASSIC Grammar-based tokenizer that is suitable for processing most European-language documents.
static final LexicalTokenizerName	EDGE_NGRAM Tokenizes the input from an edge into n-grams of the given size(s).
static final LexicalTokenizerName	KEYWORD Emits the entire input as a single token.
static final LexicalTokenizerName	LETTER Divides text at non-letters.
static final LexicalTokenizerName	LOWERCASE Divides text at non-letters and converts them to lower case.
static final LexicalTokenizerName	MICROSOFT_LANGUAGE_STEMMING_TOKENIZER Divides text using language-specific rules and reduces words to their base forms.
static final LexicalTokenizerName	MICROSOFT_LANGUAGE_TOKENIZER Divides text using language-specific rules.
static final LexicalTokenizerName	NGRAM Tokenizes the input into n-grams of the given size(s).
static final LexicalTokenizerName	PATH_HIERARCHY Tokenizer for path-like hierarchies.
static final LexicalTokenizerName	PATTERN Tokenizer that uses regex pattern matching to construct distinct tokens.
static final LexicalTokenizerName	STANDARD Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter.
static final LexicalTokenizerName	UAX_URL_EMAIL Tokenizes urls and emails as one token.
static final LexicalTokenizerName	WHITESPACE Divides text at whitespace.

Constructor Summary

Constructor	Description
LexicalTokenizerName()	Deprecated Use the fromString(String name) factory method. Creates a new instance of LexicalTokenizerName value.

Constructor

Description

LexicalTokenizerName()

Deprecated

Use the fromString(String name) factory method.

Creates a new instance of LexicalTokenizerName value.

Method Summary

Modifier and Type	Method and Description
static LexicalTokenizerName	fromString(String name) Creates or finds a LexicalTokenizerName from its string representation.
static Collection<LexicalTokenizerName>	values() Gets known LexicalTokenizerName values.

Methods inherited from ExpandableStringEnum

<T>fromString <T>values equals getValue hashCode toString

Methods inherited from java.lang.Object

clone finalize getClass notify notifyAll wait wait wait

Field Details

CLASSIC

public static final LexicalTokenizerName CLASSIC

Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html.

EDGE_NGRAM

public static final LexicalTokenizerName EDGE_NGRAM

Tokenizes the input from an edge into n-grams of the given size(s). See https://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html.

KEYWORD

public static final LexicalTokenizerName KEYWORD

Emits the entire input as a single token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/KeywordTokenizer.html.

LETTER

public static final LexicalTokenizerName LETTER

Divides text at non-letters. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html.

LOWERCASE

public static final LexicalTokenizerName LOWERCASE

Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseTokenizer.html.

MICROSOFT_LANGUAGE_STEMMING_TOKENIZER

public static final LexicalTokenizerName MICROSOFT_LANGUAGE_STEMMING_TOKENIZER

Divides text using language-specific rules and reduces words to their base forms.

MICROSOFT_LANGUAGE_TOKENIZER

public static final LexicalTokenizerName MICROSOFT_LANGUAGE_TOKENIZER

Divides text using language-specific rules.

NGRAM

public static final LexicalTokenizerName NGRAM

Tokenizes the input into n-grams of the given size(s). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html.

PATH_HIERARCHY

public static final LexicalTokenizerName PATH_HIERARCHY

Tokenizer for path-like hierarchies. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html.

PATTERN

public static final LexicalTokenizerName PATTERN

Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html.

STANDARD

public static final LexicalTokenizerName STANDARD

Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html.

UAX_URL_EMAIL

public static final LexicalTokenizerName UAX_URL_EMAIL

Tokenizes urls and emails as one token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.html.

WHITESPACE

public static final LexicalTokenizerName WHITESPACE

Divides text at whitespace. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizer.html.

Constructor Details