Use WITH clause in OPENJSON to improve parsing performance
OPENJSON function has a WITH clause where you can specify what fields should be extracted from input JSON. This might improve performance of your queries compared to the case where you use OPENJSON without schema and later extract information from the parsed JSON using JSON_VALUE function.
I have found some examples where people using OPENJSON to parse JSON arrays and then JSON_VALUE to extract information from the JSON object.
As an example, imagine that you have JSON array that contains information about cars:
[
{"year":2001,"make":"ACURA","model":"CL"},
{"year":2001,"make":"ACURA","model":"EL"},
{"year":2001,"make":"ACURA","model":"INTEGRA"},
{"year":2001,"make":"ACURA","model":"MDX"},
{"year":2001,"make":"ACURA","model":"NSX"},
{"year":2001,"make":"ACURA","model":"RL"},
{"year":2001,"make":"ACURA","model":"TL"}
]
You can take the sample from the following URL: https://raw.githubusercontent.com/arthurkao/vehicle-make-model-data/master/json\_data.json
Let assume that we want to find number of cars with model = 'Golf'. One approach would be to parse json text and return each object in the array, then check is the model value in JSON equal 'Golf':
SELECT count(*)
FROM OPENJSON(@json)
WHERE JSON_VALUE(value, '$.model') = 'Golf'
The problem in this approach is the fact that SQL Server parses the same JSON text twice - once when OPENJSON breaks elements from JSON array into JSON object (returned as value column), and second time when JSON_VALUE parses value returned by OPENJSON to extract model.
Another approach would be to tell OPENJSON that it needs to return model column when input JSON is parsed and then filter results by this column:
SELECT count(*)
FROM OPENJSON(@json) WITH(model nvarchar(20) )
WHERE model = 'Golf'
In this case, input JSON will be parsed only once.
Performance comparison
Let's see in practice what are the differences:
SET STATISTICS TIME ON
SELECT count(*) FROM OPENJSON(@json) WHERE JSON_VALUE(value, '$.model') = 'Golf'
SELECT count(*) FROM OPENJSON(@json) WITH( model nvarchar(20) ) WHERE model = 'Golf'
SET STATISTICS TIME OFF
Here are results of the queries:
SQL Server Execution Times:
CPU time = 656 ms, elapsed time = 651 ms.
SQL Server Execution Times:
CPU time = 204 ms, elapsed time = 197 ms.
As you can see, WITH clause specify that OPENJSON should immediately return properties from the JSON array without second parsing. Performance of the queries might be increased 3 times if you avoid double parsing.
Comments
- Anonymous
June 03, 2017
There's a typo in JSON: the last character should be "]".- Anonymous
June 08, 2017
Thanks, I have fixed it.- Anonymous
June 27, 2017
Hi, Jovan!Could you, please, take a look at this Microsoft Connect item? Thanks! - Anonymous
June 27, 2017
Here's link: https://connect.microsoft.com/SQLServer/feedback/details/3136467/ssms-17-1-sql-server-2017-ctp-2-1-error-when-showing-up-drives-list-and-duplication-drives-in-list-when-attaching-database
- Anonymous
- Anonymous
- Anonymous
June 25, 2017
You have brought up a very great points , appreciate it for the post. - Anonymous
September 04, 2018
Hi Jovan,It doesn't seem like a user-defined table type works in the WITH clause. For example, instead of...SELECT rd.BusinessEntityID ,rd.BusinessEntityMedicaidID ,rd.TransactionID ,rd.ReasonFROM OPENJSON(@Response) WITH(BusinessEntityID varchar(10) ,BusinessEntityMedicaidID varchar(9) ,TransactionID varchar(36) ,Reason varchar(2000)) AS rd;… code something like this:CREATE TYPE dbo.ResponseData AS TABLE (BusinessEntityID varchar(10) ,BusinessEntityMedicaidID varchar(9) ,TransactionID varchar(36) ,Reason varchar(2000));DECLARE @ResponseData dbo.ResponseData;SELECT rd.BusinessEntityID ,rd.BusinessEntityMedicaidID ,rd.TransactionID ,rd.Reason FROM OPENJSON(@Response) WITH(@ResponseData) AS rd;… or maybe preferably, use the table type directly:SELECT rd.BusinessEntityID ,rd.BusinessEntityMedicaidID ,rd.TransactionID ,rd.Reason FROM OPENJSON(@Response) WITH(dbo.ResponseData) AS rd;(These examples assume the column names match the JSON key names and they are all at the root level, of course.)Do you think this might be worth considering for implementation as a new feature?