Delen via


LexicalTokenizerName Class

public final class LexicalTokenizerName
extends ExpandableStringEnum<LexicalTokenizerName>

Defines the names of all tokenizers supported by the search engine.

Field Summary

Modifier and Type Field and Description
static final LexicalTokenizerName CLASSIC

Grammar-based tokenizer that is suitable for processing most European-language documents.

static final LexicalTokenizerName EDGE_NGRAM

Tokenizes the input from an edge into n-grams of the given size(s).

static final LexicalTokenizerName KEYWORD

Emits the entire input as a single token.

static final LexicalTokenizerName LETTER

Divides text at non-letters.

static final LexicalTokenizerName LOWERCASE

Divides text at non-letters and converts them to lower case.

static final LexicalTokenizerName MICROSOFT_LANGUAGE_STEMMING_TOKENIZER

Divides text using language-specific rules and reduces words to their base forms.

static final LexicalTokenizerName MICROSOFT_LANGUAGE_TOKENIZER

Divides text using language-specific rules.

static final LexicalTokenizerName NGRAM

Tokenizes the input into n-grams of the given size(s).

static final LexicalTokenizerName PATH_HIERARCHY

Tokenizer for path-like hierarchies.

static final LexicalTokenizerName PATTERN

Tokenizer that uses regex pattern matching to construct distinct tokens.

static final LexicalTokenizerName STANDARD

Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter.

static final LexicalTokenizerName UAX_URL_EMAIL

Tokenizes urls and emails as one token.

static final LexicalTokenizerName WHITESPACE

Divides text at whitespace.

Constructor Summary

Constructor Description
LexicalTokenizerName()

Deprecated

Use the fromString(String name) factory method.

Creates a new instance of LexicalTokenizerName value.

Method Summary

Modifier and Type Method and Description
static LexicalTokenizerName fromString(String name)

Creates or finds a LexicalTokenizerName from its string representation.

static Collection<LexicalTokenizerName> values()

Gets known LexicalTokenizerName values.

Methods inherited from ExpandableStringEnum

Methods inherited from java.lang.Object

Field Details

CLASSIC

public static final LexicalTokenizerName CLASSIC

Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html.

EDGE_NGRAM

public static final LexicalTokenizerName EDGE_NGRAM

Tokenizes the input from an edge into n-grams of the given size(s). See https://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html.

KEYWORD

public static final LexicalTokenizerName KEYWORD

Emits the entire input as a single token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/KeywordTokenizer.html.

LETTER

public static final LexicalTokenizerName LETTER

Divides text at non-letters. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html.

LOWERCASE

public static final LexicalTokenizerName LOWERCASE

Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseTokenizer.html.

MICROSOFT_LANGUAGE_STEMMING_TOKENIZER

public static final LexicalTokenizerName MICROSOFT_LANGUAGE_STEMMING_TOKENIZER

Divides text using language-specific rules and reduces words to their base forms.

MICROSOFT_LANGUAGE_TOKENIZER

public static final LexicalTokenizerName MICROSOFT_LANGUAGE_TOKENIZER

Divides text using language-specific rules.

NGRAM

public static final LexicalTokenizerName NGRAM

Tokenizes the input into n-grams of the given size(s). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html.

PATH_HIERARCHY

public static final LexicalTokenizerName PATH_HIERARCHY

Tokenizer for path-like hierarchies. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html.

PATTERN

public static final LexicalTokenizerName PATTERN

Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html.

STANDARD

public static final LexicalTokenizerName STANDARD

Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html.

UAX_URL_EMAIL

public static final LexicalTokenizerName UAX_URL_EMAIL

Tokenizes urls and emails as one token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.html.

WHITESPACE

public static final LexicalTokenizerName WHITESPACE

Divides text at whitespace. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizer.html.

Constructor Details

LexicalTokenizerName

@Deprecated
public LexicalTokenizerName()

Deprecated

Use the fromString(String name) factory method.

Creates a new instance of LexicalTokenizerName value.

Method Details

fromString

public static LexicalTokenizerName fromString(String name)

Creates or finds a LexicalTokenizerName from its string representation.

Parameters:

name - a name to look for.

Returns:

the corresponding LexicalTokenizerName.

values

public static Collection values()

Gets known LexicalTokenizerName values.

Returns:

known LexicalTokenizerName values.

Applies to