Querying JSON documents in Sql Server 2016 and DocumentDB
SQL Server JSON query syntax compared to DocumentDB
In this article I will show you similarities and differences between SQl Server JSON and DocumentDB query syntax.
SQL Server 2016/Azure SQL Db are relational databases that will have support for handling JSON data. JSON support in SQL Server uses standard T-SQL syntax for querying JSON data with some additional built-in functions that enables you to read values of objects from JSON data. JSON can be used with all components/features in SQL Server such as In-memory OLTP, Column store, temporal, etc.
DocumentDB is a write optimized and schema agnostic document database purpose built for JSON and JavaScript; it does not require any schema or secondary indices in order to serve queries. Being a database designed for JSON and JavaScript, DocumentDB uses JavaScript as its underlying type system and supports both SQL as well as JavaScript queries.
Both SQL Server 2016 and DocumentDB enable you to query JSON documents.DocumentDB has nice syntax for querying JSON documents – you can find some good examples on DocumentDB site . Sql Server provides built-in functions for accessing JSON fields (JSON_VALUE), fragments (JSON_QUERY) and opening JSON documents (OPENJSON). In this post, I will show you some queries that can be executed in DocumentDB and equivalent Sql Server queries.
I have used DocumentDB as a reference because it has easy-to-understand syntax, so I believe that you will easily understand how to use equivalent Sql Server queries.
JSON Collections
DocumentDB stores JSON documents in collections. Since Sql Server does not have collections we will create simple table that would contain documents same as in the DocumentDB site:
CREATE TABLE Families (
--id int identity constraint PK_JSON_ID primary key,
doc nvarchar(max) CONSTRAINT [Properly formatted JSON] CHECK (ISJSON(doc)>0)
)
INSERT INTO Families
VALUES(N'{
"id": "AndersenFamily",
"lastName": "Andersen",
"parents": [
{ "firstName": "Thomas" },
{ "firstName": "Mary Kay"}
],
"children": [
{
"firstName": "Henriette Thaulow", "gender": "female", "grade": 5,
"pets": [{ "givenName": "Fluffy" }]
}
],
"address": { "state": "WA", "county": "King", "city": "seattle" },
"creationDate": 1431620472,
"isRegistered": true
}'),
(N'{
"id": "WakefieldFamily",
"parents": [
{ "familyName": "Wakefield", "givenName": "Robin" },
{ "familyName": "Miller", "givenName": "Ben" }
],
"children": [
{
"familyName": "Merriam",
"givenName": "Jesse",
"gender": "female", "grade": 1,
"pets": [
{ "givenName": "Goofy" },
{ "givenName": "Shadow" }
]
},
{
"familyName": "Miller",
"givenName": "Lisa",
"gender": "female",
"grade": 8 }
],
"address": { "state": "NY", "county": "Manhattan", "city": "NY" },
"creationDate": 1431620462,
"isRegistered": false
}')
You can optionally add a primary key or any other index if you want; however, this is not required for this tutorial. The same collection is created in DocumentDB examples, so you can easily compare results of execution in both systems.
Basic queries
We can start with the simplest queries. For example, the following query will return the documents where the id field matches AndersenFamily:
SELECT *
FROM Families f
WHERE f.id = "AndersenFamily"
Equivalent query in Sql Server would be:
SELECT doc
FROM Families f
WHERE JSON_VALUE(f.doc, '$.id') = N'AndersenFamily'
In Sql Server you can use JSON_VALUE function to access value of JSON property on some path.
In DocumentDB you can select all states in address objects in family documents:
SELECT *
FROM Families.address.state
Equivalent Sql Server query would be:
SELECT JSON_VALUE(doc, '$.address.state')
FROM Families
WHERE JSON_VALUE(doc, '$.address.state') IS NOT NULL
In DocumentDB If some families don’t have an address.state value, they will be excluded in the query result. Therefore we need to add WHERE clause to exclude them because in SQL Server they will be returned as NULL values.
This query projects a Name and City, when the address' city has the same name as the state. In this case, "NY, NY" matches.
SELECT {"Name":f.id, "City":f.address.city} AS Family
FROM Families f
WHERE f.address.city = f.address.state
Equivalent Sql Server query is:
SELECT JSON_VALUE(f.doc, '$.id') AS Name, JSON_VALUE(f.doc, '$.address.city') AS City
FROM Families f
WHERE JSON_VALUE(f.doc, '$.address.city') = JSON_VALUE(f.doc, '$.address.state')
SQL Server provides you separate function OPENJSON that can open json document in the doc column and you can specify what fields you want to use as well as the types of the fields:
SELECT Name, City
FROM Families f
CROSS APPLY OPENJSON(f.doc)
WITH ( Name nvarchar(100) '$.id',
City nvarchar(100) '$.address.city',
State nvarchar(100) '$.address.state')
WHERE City = State
Beside the fact that fields are strongly types you can directly use their aliases in the queries. This might help if you have a lot of properties in the same query.
If you don’t want to use path you can name you property following path syntax:
SELECT id AS Name, [address.city] AS City
FROM Families f
CROSS APPLY OPENJSON(f.doc)
WITH ( id nvarchar(100),
[address.city] nvarchar(100),
[address.state] nvarchar(100))
WHERE [address.city] = [address.state]
Sql Server will use path syntax in the property names and lookup properties on the same paths.
In DocumentDB the IN keyword can be used to check whether a specified value matches any value in a list. For example, this query returns all family documents where the id is one of "WakefieldFamily" or "AndersenFamily".
SELECT *
FROM Families
WHERE Families.id IN ('AndersenFamily', 'WakefieldFamily')
Equivalent Sql Server query would use standard T-Sql IN predicate:
SELECT *
FROM Families
WHERE JSON_VALUE(doc, '$.id') IN ('AndersenFamily', 'WakefieldFamily')
Accessing sub entities
If you have complex JSON documents you might want to access various sub-entities such as children or pets in the example above. The following DocumentDB query returns all children objects from family documents:
SELECT *
FROM Families.children
Equivalent Sql server query would look like:
SELECT JSON_QUERY(doc, '$.children')
FROM Families
JSON_QUERY function is similar to JSON_VALUE. The main difference is that it returns JSON fragments (e.g. entire JSON sub-objects or sub-arrays within the documents), while JSON_VALUE returns scalars (i.e. number, strings true/false values). In this case we are using this function to return child objects at $.children path.
Now let's look at another query that performs iteration over children in the collection. Note the difference in the output array. This example splits children and flattens the results into a single array.
SELECT *
FROM c IN Families.children
Equivalent Sql server query is:
SELECT c.value
FROM Families f
CROSS APPLY OPENJSON(f.doc, '$.children') as c
OPENJSON will open all JSON objects in the $.children array and return them as dynamically created table rows. Each JSON element in the array is returned as value column.
In DocumentDB we can combine fragments and scalar values in the same query:
SELECT f.address
FROM Families f
WHERE f.id = "AndersenFamily"
Equivalent Sql server query would combine JSON_VALUE and JSON_QUERY functions:
SELECT JSON_QUERY(f.doc, '$.address')
FROM Families f
WHERE JSON_VALUE(f.doc, '$.id') = N'AndersenFamily'
Equivalent query with OPENJSON function is:
SELECT address
FROM Families f
CROSS APPLY OPENJSON(f.doc)
WITH (id nvarchar(10), address nvarchar(max) AS JSON)
WHERE id = N'AndersenFamily'
Specifying columns in WITH clause is equivalent to JSON_VALUE function. If you want to reference sub-object (like in JSON_QUERY) you would need to specify NVARCHAR(max) type and add AS JSON clause. Without AS JSON clause, OPENJSON will assume that you want scalar property with the same name and since it will not find it, it will return NULL.
The following query returns all family documents in which the first child's grade is between 1-5 (both inclusive):
SELECT *
FROM Families.children[0] c
WHERE c.grade BETWEEN 1 AND 5
In Sql Server we can use one the following query:
SELECT f.*
FROM Families f
WHERE CAST(JSON_VALUE(f.doc, '$.children[0].grade') AS int) BETWEEN 1 AND 5
If you want to avoid CAST function, you can use OPENJSON with WITH schema:
SELECT f.*
FROM Families f
CROSS APPLY OPENJSON(doc, '$.children[0]') WITH(grade int) c
WHERE c.grade BETWEEN 1 AND 5
Following query returns names of children at eight grades:
SELECT c.givenName
FROM c IN Families.children
WHERE c.grade = 8
Equivalent Sql Server query is:
SELECT c.givenName
FROM Families f
CROSS APPLY OPENJSON(f.doc, '$.children')
WITH(grade int, givenName nvarchar(100)) c
WHERE c.grade = 8
JOINS
One of the important functionalities is ability to join parent object with their child entities. In DocumentDB you use JOIN clause to join families and their children the same way as yo would do in Sql Server with relational tables:
SELECT c.givenName
FROM Families f
JOIN c IN f.children
WHERE f.id = 'WakefieldFamily'
ORDER BY f.address.city ASC
Joins between parent and child objects are similar to joins between parent and child tables in Sql server. If you want to join parent and child JSON entities in Sql Server, you can open it using CROSS APPLY operator:
SELECT JSON_VALUE(c.value, '$.givenName') AS givenName
FROM Families f
CROSS APPLY OPENJSON(f.doc, '$.children') AS c
WHERE JSON_VALUE(f.doc, '$.id') = 'WakefieldFamily'
ORDER BY JSON_VALUE(f.doc, '$.address.city') ASC
You can also format results as JSON text using FOR JSON clause:
SELECT JSON_VALUE(c.value, '$.givenName') AS givenName
FROM Families f
CROSS APPLY OPENJSON(f.doc, '$.children') AS c
WHERE JSON_VALUE(f.doc, '$.id') = 'WakefieldFamily'
ORDER BY JSON_VALUE(f.doc, '$.address.city') ASC
FOR JSON PATH
Behavior of JOIN in DocumentDB is similar to the behavior of CROSS APPLY in Sql Server. In the following example, the result is empty since the cross product of each document from source and an empty set is empty:
SELECT f.id
FROM Families f
JOIN f.NonExistent
In Sql server you will also not get any results if CROSS APPLY references non-existing array:
SELECT Families.*
FROM Families
CROSS APPLY OPENJSON(doc, '$.NonExistent')
In the most complex case, you might join several parent/child objects to get something like the following pseudo-code:
for-each(Family f in Families)
{
for-each(Child c in f.children)
{
for-each(Pet p in c.pets)
{
return (Tuple(f.id AS familyName,
c.givenName AS childGivenName,
c.firstName AS childFirstName,
p.givenName AS petName));
}
}
}
In DocumentDB you can use multiple JOIN conditions:
SELECT
f.id AS familyName,
c.givenName AS childGivenName,
c.firstName AS childFirstName,
p.givenName AS petName
FROM Families f
JOIN c IN f.children
JOIN p IN c.pets
In Sql Server you would use multiple CROSS APPLY operators, as it is shown in the following query:
SELECT JSON_VALUE(f.doc, '$.id') AS familyName,
JSON_VALUE(c.value, '$.givenName') AS childGivenName,
JSON_VALUE(c.value, '$.firstName') AS childFirstName,
JSON_VALUE(p.value, '$.givenName') AS petName
FROM Families f
CROSS APPLY OPENJSON(f.doc, '$.children') as c
CROSS APPLY OPENJSON (c.value, '$.pets') as p
As an alternative you can use OPENJSON with schema:
SELECT JSON_VALUE(f.doc, '$.id') AS familyName,
c.givenName AS childGivenName,
c.firstName AS childFirstName,
p.givenName AS petName
FROM Families f
CROSS APPLY OPENJSON(f.value, '$.children')
WITH (givenName nvarchar(100), firstName nvarchar(100), pets nvarchar(max) AS JSON) as c
CROSS APPLY OPENJSON (pets)
WITH (givenName nvarchar(100)) as p
Or:
SELECT familyName,
c.givenName AS childGivenName,
c.firstName AS childFirstName,
p.givenName AS petName
FROM Families f
CROSS APPLY OPENJSON(f.value)
WITH (familyName nvarchar(100), children nvarchar(max) AS JSON)
CROSS APPLY OPENJSON(children)
WITH (givenName nvarchar(100), firstName nvarchar(100), pets nvarchar(max) AS JSON) as c
CROSS APPLY OPENJSON (pets)
WITH (givenName nvarchar(100)) as p
Although syntax in SQL Server and DocumentDB is slightly different, you might notice that most of the queries you can write in both systems.
References
JSON Data (SQL Server) - MSDN - Microsoft
https://azure.microsoft.com/en-us/documentation/articles/documentdb-sql-query/
https://azure.microsoft.com/en-us/blog/azure-documentdb-javascript-as-modern-day-t-sql/
Comments
Anonymous
October 06, 2015
When should you use SQL Server's JSON support over DocumentDB and vice-versa?Anonymous
October 06, 2015
The comment has been removedAnonymous
October 07, 2015
The comment has been removedAnonymous
October 07, 2015
You are right that was a wrong copy/paste. Correct query is: SELECT JSON_QUERY(doc, '$.children') FROM Families Thanks, JovanAnonymous
July 11, 2016
I'm thinking about using the Json functionality in SQL server to replace linktables for m:n relations. For instance, if you have a recipe entity, that can use multiple ingredient entities, and an ingredient can be used in multiple recipes, traditionally you would have a "RecipeIngredient"-table, with a RecipeId, an IngredientId, and the "AmountUsed"Another approach is just have an "IngredientsJson"-field in the recipeentity, where the ingredientId's and AmountUsed per ingredient is represented in a json string. One thing that's lost for sure is the referential integrity of the ingredientId. but I'm wondering on the performance of this approach. Like "find me all recipe's where this ingredient is used for at least that amount". What if you have thousands of recipes ? Any thoughts on this ?- Anonymous
July 14, 2016
Is there some reason why you want to de-normalize tables? In your case it seems that separate tables will be optimal choice, since you need to query one table based on selected rows from the another one. JSON is good approach if you can identify one primary table and secondary related table that can be represented as an array, so you can avoid joins if you know that you will need to fetch all related ingredients together with receipt. However, if you need to do some filtering based on the values in JSON, SQL Server will need to parse each JSON array to check this condition.You can easily write that kind of query with CROSS APPLY OPENJSON(IngredientsJson), but performance would be better if you keep it in the separate table.
- Anonymous