Freigeben über


Querying JSON documents in Sql Server 2016 and DocumentDB

SQL Server JSON query syntax compared to DocumentDB

In this article I will show you similarities and differences between SQl Server JSON and DocumentDB query syntax.

SQL Server 2016/Azure SQL Db are relational databases that will have support for handling JSON data. JSON support in SQL Server uses standard T-SQL syntax for querying JSON data with some additional built-in functions that enables you to read values of objects from JSON data. JSON can be used with all components/features in SQL Server such as In-memory OLTP, Column store, temporal, etc.

DocumentDB is a write optimized and schema agnostic document database purpose built for JSON and JavaScript; it does not require any schema or secondary indices in order to serve queries. Being a database designed for JSON and JavaScript, DocumentDB uses JavaScript as its underlying type system and supports both SQL as well as JavaScript queries.

Both SQL Server 2016 and DocumentDB enable you to query JSON documents.DocumentDB has nice syntax for querying JSON documents – you can find some good examples on DocumentDB site . Sql Server provides built-in functions for accessing JSON fields (JSON_VALUE), fragments (JSON_QUERY) and opening JSON documents (OPENJSON). In this post, I will show you some queries that can be executed in DocumentDB and equivalent Sql Server queries.

I have used DocumentDB as a reference because it has easy-to-understand syntax, so I believe that you will easily understand how to use equivalent Sql Server queries.

JSON Collections

DocumentDB stores JSON documents in collections. Since Sql Server does not have collections we will create simple table that would contain documents same as in the DocumentDB site:

 CREATE TABLE Families (

 --id int identity constraint PK_JSON_ID primary key,

 doc nvarchar(max) CONSTRAINT [Properly formatted JSON] CHECK (ISJSON(doc)>0)

)

INSERT INTO Families

VALUES(N'{

    "id": "AndersenFamily",

    "lastName": "Andersen",

    "parents": [

       { "firstName": "Thomas" },

       { "firstName": "Mary Kay"}

    ],

    "children": [

       {

           "firstName": "Henriette Thaulow", "gender": "female", "grade": 5,

           "pets": [{ "givenName": "Fluffy" }]

       }

    ],

    "address": { "state": "WA", "county": "King", "city": "seattle" },

    "creationDate": 1431620472,

    "isRegistered": true

}'),

(N'{

    "id": "WakefieldFamily",

    "parents": [

        { "familyName": "Wakefield", "givenName": "Robin" },

        { "familyName": "Miller", "givenName": "Ben" }

    ],

    "children": [

        {

            "familyName": "Merriam",

            "givenName": "Jesse",

            "gender": "female", "grade": 1,

            "pets": [

                { "givenName": "Goofy" },

                { "givenName": "Shadow" }

            ]

        },

        {

            "familyName": "Miller",

             "givenName": "Lisa",

             "gender": "female",

             "grade": 8 }

    ],

    "address": { "state": "NY", "county": "Manhattan", "city": "NY" },

    "creationDate": 1431620462,

    "isRegistered": false

}')

You can optionally add a primary key or any other index if you want; however, this is not required for this tutorial. The same collection is created in DocumentDB examples, so you can easily compare results of execution in both systems.

Basic queries

We can start with the simplest queries. For example, the following query will return the documents where the id field matches AndersenFamily:

 SELECT * 
FROM Families f 
WHERE f.id = "AndersenFamily"

 

Equivalent query in Sql Server would be:

 SELECT doc 
FROM Families f 
WHERE JSON_VALUE(f.doc, '$.id') = N'AndersenFamily'

 

In Sql Server you can use JSON_VALUE function to access value of JSON property on some path.

In DocumentDB you can select all states in address objects in family documents:

 SELECT * 
FROM Families.address.state

 

Equivalent Sql Server query would be:

 SELECT JSON_VALUE(doc, '$.address.state') 
FROM Families
WHERE JSON_VALUE(doc, '$.address.state') IS NOT NULL

In DocumentDB If some families don’t have an address.state value, they will be excluded in the query result. Therefore we need to add WHERE clause to exclude them because in SQL Server they will be returned as NULL values.

This query projects a Name and City, when the address' city has the same name as the state. In this case, "NY, NY" matches.

 

 SELECT {"Name":f.id, "City":f.address.city} AS Family 
FROM Families f 
WHERE f.address.city = f.address.state

 

Equivalent Sql Server query is:

 

 SELECT JSON_VALUE(f.doc, '$.id')  AS Name, JSON_VALUE(f.doc, '$.address.city') AS City
FROM Families f 
WHERE JSON_VALUE(f.doc, '$.address.city') = JSON_VALUE(f.doc, '$.address.state')

 

SQL Server provides you separate function OPENJSON that can open json document in the doc column and you can specify what fields you want to use as well as the types of the fields:

 SELECT Name, City
FROM Families f 
 CROSS APPLY OPENJSON(f.doc)
     WITH (  Name nvarchar(100) '$.id',
          City nvarchar(100) '$.address.city',
            State nvarchar(100) '$.address.state')
WHERE City = State

 

Beside the fact that fields are strongly types you can directly use their aliases in the queries. This might help if you have a lot of properties in the same query.

If you don’t want to use path you can name you property following path syntax:

 SELECT id AS Name, [address.city]  AS City
FROM Families f 
    CROSS APPLY OPENJSON(f.doc)
     WITH (  id nvarchar(100),
               [address.city] nvarchar(100),
               [address.state] nvarchar(100))
WHERE [address.city] = [address.state]

Sql Server will use path syntax in the property names and lookup properties on the same paths.

In DocumentDB the IN keyword can be used to check whether a specified value matches any value in a list. For example, this query returns all family documents where the id is one of "WakefieldFamily" or "AndersenFamily".

 SELECT *
FROM Families 
WHERE Families.id IN ('AndersenFamily', 'WakefieldFamily')

Equivalent Sql Server query would use standard T-Sql IN predicate:

 SELECT *
FROM Families 
WHERE JSON_VALUE(doc, '$.id') IN ('AndersenFamily', 'WakefieldFamily')

Accessing sub entities

If you have complex JSON documents you might want to access various sub-entities such as children or pets in the example above. The following DocumentDB query returns all children objects from family documents:

 SELECT *
FROM Families.children

Equivalent Sql server query would look like:

 SELECT JSON_QUERY(doc, '$.children')
FROM Families 

JSON_QUERY function is similar to JSON_VALUE. The main difference is that it returns JSON fragments (e.g. entire JSON sub-objects or sub-arrays within the documents), while JSON_VALUE returns scalars (i.e. number, strings true/false values). In this case we are using this function to return child objects at $.children path.

Now let's look at another query that performs iteration over children in the collection. Note the difference in the output array. This example splits children and flattens the results into a single array.

 SELECT *
FROM c IN Families.children

Equivalent Sql server query is:

 SELECT c.value
FROM Families f
     CROSS APPLY OPENJSON(f.doc, '$.children') as c

OPENJSON will open all JSON objects in the $.children array and return them as dynamically created table rows. Each JSON element in the array is returned as value column.

In DocumentDB we can combine fragments and scalar values in the same query:

 SELECT f.address
FROM Families f 
WHERE f.id = "AndersenFamily"

 

Equivalent Sql server query would combine JSON_VALUE and JSON_QUERY functions:

 SELECT JSON_QUERY(f.doc, '$.address')
FROM Families f 
WHERE JSON_VALUE(f.doc, '$.id') = N'AndersenFamily'

 

Equivalent query with OPENJSON function is:

 SELECT address
FROM Families f 
CROSS APPLY OPENJSON(f.doc)
WITH (id nvarchar(10), address nvarchar(max) AS JSON)
WHERE id = N'AndersenFamily'

 

Specifying columns in WITH clause is equivalent to JSON_VALUE function. If you want to reference sub-object (like in JSON_QUERY) you would need to specify NVARCHAR(max) type and add AS JSON clause. Without AS JSON clause, OPENJSON will assume that you want scalar property with the same name and since it will not find it, it will return NULL.

The following query returns all family documents in which the first child's grade is between 1-5 (both inclusive):

 SELECT *
FROM Families.children[0] c
WHERE c.grade BETWEEN 1 AND 5

In Sql Server we can use one the following query:

 SELECT f.*
FROM Families f
WHERE CAST(JSON_VALUE(f.doc, '$.children[0].grade') AS int) BETWEEN 1 AND 5

If you want to avoid CAST function, you can use OPENJSON with WITH schema:

 SELECT f.*
FROM Families f
 CROSS APPLY OPENJSON(doc, '$.children[0]') WITH(grade int)  c
WHERE c.grade BETWEEN 1 AND 5

 

Following query returns names of children at eight grades:

 SELECT c.givenName
FROM c IN Families.children
WHERE c.grade = 8

 

Equivalent Sql Server query is:

 SELECT c.givenName
FROM Families f
     CROSS APPLY OPENJSON(f.doc, '$.children')
           WITH(grade int, givenName nvarchar(100))  c
WHERE c.grade = 8

 

JOINS

One of the important functionalities is ability to join parent object with their child entities. In DocumentDB you use JOIN clause to join families and their children the same way as yo would do in Sql Server with relational tables:

 SELECT c.givenName 
FROM Families f 
     JOIN c IN f.children 
WHERE f.id = 'WakefieldFamily'
ORDER BY f.address.city ASC

 

Joins between parent and child objects are similar to joins between parent and child tables in Sql server. If you want to join parent and child JSON entities in Sql Server, you can open it using CROSS APPLY operator:

 SELECT JSON_VALUE(c.value, '$.givenName') AS givenName
FROM Families f 
    CROSS APPLY OPENJSON(f.doc, '$.children') AS c
WHERE JSON_VALUE(f.doc, '$.id') = 'WakefieldFamily'
ORDER BY JSON_VALUE(f.doc, '$.address.city') ASC

 

You can also format results as JSON text using FOR JSON clause:

 

 SELECT JSON_VALUE(c.value, '$.givenName') AS givenName
FROM Families f 
    CROSS APPLY OPENJSON(f.doc, '$.children') AS c
WHERE JSON_VALUE(f.doc, '$.id') = 'WakefieldFamily'
ORDER BY JSON_VALUE(f.doc, '$.address.city') ASC
FOR JSON PATH

 

Behavior of JOIN in DocumentDB is similar to the behavior of CROSS APPLY in Sql Server. In the following example, the result is empty since the cross product of each document from source and an empty set is empty:

 SELECT f.id
FROM Families f
     JOIN f.NonExistent

 

In Sql server you will also not get any results if CROSS APPLY references non-existing array:

 SELECT Families.*
FROM Families
     CROSS APPLY OPENJSON(doc, '$.NonExistent')

 

In the most complex case, you might join several parent/child objects to get something like the following pseudo-code:

 for-each(Family f in Families)
{   
    for-each(Child c in f.children)
    {
        for-each(Pet p in c.pets)
        {
            return (Tuple(f.id AS familyName, 
              c.givenName AS childGivenName, 
              c.firstName AS childFirstName,
              p.givenName AS petName));
        }
    }
}

 

In DocumentDB you can use multiple JOIN conditions:

 SELECT 
    f.id AS familyName,
    c.givenName AS childGivenName,
    c.firstName AS childFirstName,
    p.givenName AS petName 
FROM Families f 
     JOIN c IN f.children 
          JOIN p IN c.pets

 

In Sql Server you would use multiple CROSS APPLY operators, as it is shown in the following query:

 SELECT    JSON_VALUE(f.doc, '$.id') AS familyName,
    JSON_VALUE(c.value, '$.givenName') AS childGivenName,
   JSON_VALUE(c.value, '$.firstName') AS childFirstName,
   JSON_VALUE(p.value, '$.givenName') AS petName 
FROM Families f 
  CROSS APPLY OPENJSON(f.doc, '$.children') as c
      CROSS APPLY OPENJSON (c.value, '$.pets') as p

As an alternative you can use OPENJSON with schema:

 

 SELECT    JSON_VALUE(f.doc, '$.id') AS familyName,
    c.givenName AS childGivenName,
  c.firstName AS childFirstName,
  p.givenName AS petName 
FROM Families f 
 CROSS APPLY OPENJSON(f.value, '$.children') 
    WITH (givenName nvarchar(100), firstName nvarchar(100), pets nvarchar(max) AS JSON) as c
        CROSS APPLY OPENJSON (pets)
     WITH (givenName nvarchar(100))  as p

Or: 

 SELECT    familyName,
 c.givenName AS childGivenName,
  c.firstName AS childFirstName,
  p.givenName AS petName 
FROM Families f 
 CROSS APPLY OPENJSON(f.value) 
      WITH (familyName nvarchar(100), children nvarchar(max) AS JSON)
     CROSS APPLY OPENJSON(children) 
     WITH (givenName nvarchar(100), firstName nvarchar(100), pets nvarchar(max) AS JSON) as c
            CROSS APPLY OPENJSON (pets)
         WITH (givenName nvarchar(100))  as p

 

Although syntax in SQL Server and DocumentDB is slightly different, you might notice that most of the queries you can write in both systems.

 

References

  JSON Data (SQL Server) - MSDN - Microsoft

https://azure.microsoft.com/en-us/documentation/articles/documentdb-sql-query/

https://azure.microsoft.com/en-us/blog/azure-documentdb-javascript-as-modern-day-t-sql/

Comments

  • Anonymous
    October 06, 2015
    When should you use SQL Server's JSON support over DocumentDB and vice-versa?

  • Anonymous
    October 06, 2015
    The comment has been removed

  • Anonymous
    October 07, 2015
    The comment has been removed

  • Anonymous
    October 07, 2015
    You are right that was a wrong copy/paste. Correct query is: SELECT JSON_QUERY(doc, '$.children') FROM Families Thanks, Jovan

  • Anonymous
    July 11, 2016
    I'm thinking about using the Json functionality in SQL server to replace linktables for m:n relations. For instance, if you have a recipe entity, that can use multiple ingredient entities, and an ingredient can be used in multiple recipes, traditionally you would have a "RecipeIngredient"-table, with a RecipeId, an IngredientId, and the "AmountUsed"Another approach is just have an "IngredientsJson"-field in the recipeentity, where the ingredientId's and AmountUsed per ingredient is represented in a json string. One thing that's lost for sure is the referential integrity of the ingredientId. but I'm wondering on the performance of this approach. Like "find me all recipe's where this ingredient is used for at least that amount". What if you have thousands of recipes ? Any thoughts on this ?

    • Anonymous
      July 14, 2016
      Is there some reason why you want to de-normalize tables? In your case it seems that separate tables will be optimal choice, since you need to query one table based on selected rows from the another one. JSON is good approach if you can identify one primary table and secondary related table that can be represented as an array, so you can avoid joins if you know that you will need to fetch all related ingredients together with receipt. However, if you need to do some filtering based on the values in JSON, SQL Server will need to parse each JSON array to check this condition.You can easily write that kind of query with CROSS APPLY OPENJSON(IngredientsJson), but performance would be better if you keep it in the separate table.