Full text search in Azure Cosmos DB for NoSQL (preview)
Azure Cosmos DB for NoSQL now offers a powerful Full Text Search feature in preview, designed to enhance the search capabilities of your applications.
Note
Full Text & Hybrid Search is in early preview and may not be available in all regions at this time.
What is full text search?
Azure Cosmos DB for NoSQL now offers a powerful Full Text Search feature in preview, designed to enhance your data querying capabilities. This feature includes advanced text processing techniques such as stemming, stop word removal, and tokenization, enabling efficient and effective text searches through a specialized text index. Full text search also includes full text scoring with a function that evaluates the relevance of documents to a given search query. BM25, or Best Matching 25, considers factors like term frequency, inverse document frequency, and document length to score and rank documents. This helps ensure that the most relevant documents appear at the top of the search results, improving the accuracy and usefulness of text searches.
Full Text Search is ideal for a variety of scenarios, including:
- E-commerce: Quickly find products based on descriptions, reviews, and other text attributes.
- Content management: Efficiently search through articles, blogs, and documents.
- Customer support: Retrieve relevant support tickets, FAQs, and knowledge base articles.
- User content: Analyze and search through user-generated content such as posts and comments.
- RAG for chatbots: Enhance chatbot responses by retrieving relevant information from large text corpora, improving the accuracy and relevance of answers.
- Multi-Agent AI apps: Enable multiple AI agents to collaboratively search and analyze vast amounts of text data, providing comprehensive and nuanced insights.
How to use full text search
Note
Full Text & Hybrid Search (preview) may not be available in all regions at this time.
- Enable the "Full Text & Hybrid Search for NoSQL" preview feature.
- Configure a container with a full text policy and full text index.
- Insert your data with text properties.
- Run hybrid queries against the data.
Enable the full text and hybrid search for NoSQL preview feature
Full text search, full text scoring, and hybrid search all require enabling the preview feature on your Azure Cosmos DB for NoSQL account before using. Follow the below steps to register:
- Navigate to your Azure Cosmos DB for NoSQL resource page.
- Select the "Features" pane under the "Settings" menu item.
- Select the "Full-Text & Hybrid Search for NoSQL API (preview)" feature.
- Read the description of the feature to confirm you want to enable it.
- Select "Enable" to turn on the vector indexing and search capability.
Configure container policies and indexes for hybrid search
To use full text search capabilities, you'll first need to define two policies:
- A container-level full text policy that defines what paths will contain text for the new full text query system functions.
- A full text index added to the indexing policy that enables efficient search.
Full text policy
For every text property you'd like to configure for full text search, you must declare both the path
of the property with text and the language
of the text. A simple full text policy can be:
{
"defaultLanguage": "en-US",
"fullTextPaths": [
{
"path": "/text",
"language": "en-US"
}
]
}
Defining multiple text paths is easily done by adding another element to the fullTextPolicy
array:
{
"defaultLanguage": "en-US",
"fullTextPaths": [
{
"path": "/text1",
"language": "en-US"
},
{
"path": "/text2",
"language": "en-US"
}
]
}
Note
English ("en-us" as the language) is the only supported language at this time.
Important
Wild card characters (*, []) are not currently supported in the full text policy or full text index.
Full text index
Any full text search operations should make use of a full text index. A full text index can easily be defined in any Azure Cosmos DB for NoSQL index policy per the example below.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
},
],
"fullTextIndexes": [
{
"path": "/text"
}
]
}
Just as with the full text policies, full text indexes can be defined on multiple paths.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
},
],
"fullTextIndexes": [
{
"path": "/text"
},
{
"path": "/text2"
}
]
}
Full text search queries
Full text search and scoring operations are performed using the following system functions in the Azure Cosmos DB for NoSQL query language:
FullTextContains
: Returnstrue
if a given string is contained in the specified property of a document. This is useful in aWHERE
clause when you want to ensure specific key words are included in the documents returned by your query.FullTextContainsAll
: Returnstrue
if all of the given strings are contained in the specified property of a document. This is useful in aWHERE
clause when you want to ensure that multiple key words are included in the documents returned by your query.FullTextContainsAny
: Returnstrue
if any of the given strings are contained in the specified property of a document. This is useful in aWHERE
clause when you want to ensure that at least one of the key words is included in the documents returned by your query.FullTextScore
: Returns a score. This can only be used in anORDER BY RANK
clause, where the returned documents are ordered by the rank of the full text score, with most relevant (highest scoring) documents at the top, and least relevant (lowest scoring) documents at the bottom
Here are a few examples of each function in use.
FullTextContains
In this example, we want to obtain the first 10 results where the keyword "bicycle" is contained in the property c.text
.
SELECT TOP 10 *
FROM c
WHERE FullTextContains(c.text, "bicycle")
FullTextContainsAll
In this example, we want to obtain first 10 results where the keywords "red" and "bicycle" are contained in the property c.text
.
SELECT TOP 10 *
FROM c
WHERE FullTextContainsAll(c.text, "red", "bicycle")
FullTextContainsAny
In this example, we want to obtain the first 10 results where the keywords "red" and either "bicycle" or "skateboard" are contained in the property c.text
.
SELECT TOP 10 *
FROM c
WHERE FullTextContains(c.text, "red") AND FullTextContainsAny(c.text, "bicycle", "skateboard")
FullTextScore
In this example, we want to obtain the first 10 results where "mountain" and "bicycle" are included, and sorted by order of relevance. That is, documents that have these terms more often should appear higher in the list.
SELECT TOP 10 *
FROM c
ORDER BY RANK FullTextScore(c.text, ["bicycle", "mountain"])
Important
FullTextScore can only be used in the ORDER BY RANK
clause and not projected in the SELECT
statement or in a WHERE
clause.