Searching for text with DocumentDB

A common ask among DocumentDB customers is, “How do I search for documents containing some string value?” In this post, will explore two different ways of doing this, depending on what you are trying to do.

1. Tokenizing words

The first method is easy to implement and works well when your requirements are relatively simple word matching. Consider a document that looks like the JSON below:

 {
    "id": "CDC101",
    "title": "Fundamentals of database design",     "credits":  10
}

 

How would I search for all documents that had the word ‘database’ in the title field? A simple way to do this would be to tokenize the title field and create JSON as below:

 {
    "id": "CDC101",
    "title": "Fundamentals of database design",
    "titleWords": [ "fundamentals", "database", "design" ],
    "credits":  10
}

Note:  consider using a RegEx to transform words to lowercase and remove any punctuation. Also, strip out stop words like “to”, “the”, “of”  etc. (https://en.wikipedia.org/wiki/Stop_words)

Now searching for words in the title becomes easy with the following query:

 SELECT r FROM root r JOIN word IN r.titleWords WHERE word = "database" 

 

The query above is very efficient because it will take advantage of the fact that each word in the array will be indexed by default in DocumentDB allowing for quick equality matching as we are doing here. Another great advantage to this approach is that it will honor the consistency levels of your database meaning that any changes to your tokenized words will be available immediately.

If you have several words to tokenize, you accommodate the extra storage required to store the additional array of words, or you need more complex multi-faceted full-text searching capabilities across multiple fields etc. then the above approach will not work for you and you need the assistance of a more powerful full-text capable search engine. Luckily Azure has one of these that is super easy to setup and use, it is called Azure Search.

You can setup a data source pointing to your DocumentDB database and have a Search indexer crawl through your data on a predefined schedule.

For detailed steps on setting this up, check out Connecting DocumentDB with Azure Search using indexers. You can also download a sample ASP.NET MVC web application using DocumentDB and Search together.

In order to run this sample, simply create a Search account and a DocumentDB account. Once you have these, update the web.config with your endpoints and keys (obtainable from the Azure Management Portal).

The download includes sample todo items which you can import in to DocumentDB if you want to start with some canned data, or you can just crack open the solution in Visual Studio and run the project to start with a clean slate.

Add some todo items, hit the index button to force a manual re-index and then go do some searching.

So, there you have it – > Searching for text within DocumentDB is easy!

To learn more about DocumentDB, visit our service page and to learn more about DocumentDB query syntax, please visit our Query Playground page.

Comments

  • Anonymous
    March 09, 2015
    Thanks

  • Anonymous
    March 09, 2015
    (y)

  • Anonymous
    March 10, 2015
    More details on using Azure Search with DocumentDB and flattening via a query:  blogs.msdn.com/.../indexing-documentdb-with-azure-seach.aspx

  • Anonymous
    May 20, 2015
    nice, but i'm stil wait for the LIKE '%xyz%' and contain in linq.