Поделиться через


JSON parsing 10x faster than XML parsing

Usually, when people talk about the performance of JSON in SQL Server, they are starting with “JSON is not actually a native type, it is just a plain text” and imply that it is probably too slow. It is true that JSON is stored as NVARCHAR and that there is no special type like for XML or Spatial. However, it does not imply that performance is worse than performance of full-blown types. NVARCHAR is not a plain text as someone would assume – it is probably the most optimized non-scalar type in SQL Server. JSON leverages simplicity of NVARCHAR and this might be one of the biggest strengths and not weaknesses. In this post, I will compare performance of XML and JSON in a simple scenario – accessing a field on a specified path, and show that paring JSON plain text might be 10x faster than using native types.

Experiment

In this experiment, I’m using the same structure of JSON and XML and trying to get a value on the specified path (XPATH for XML and JSON path for JSON). Here are sample JSON/XML documents used in this experiment.

 DECLARE @json AS NVARCHAR(200) = '{"Root":{"ProductDescription":{"ProductID":"1","ProductName": "Road Bike","Features": {"Warranty": "1 year parts and labor","Maintenance": "3 year parts and labor extended maintenance is available"}}}}'
 DECLARE @xml as xml = N'<?xml version="1.0"?><Root><ProductDescription><Features><Maintenance>3 year parts and labor extended maintenance is available</Maintenance><Warranty>1 year parts and labor</Warranty></Features><ProductID>1</ProductID><ProductName>Road Bike</ProductName></ProductDescription></Root>'

XML text is slightly bigger, but it is pre-parsed into strongly typed XML variable so I’m not counting the time needed to process XML (this is even the best case for XML variable although sometime not realisic case).
I’m measuring the time needed to get the value both from XML and JSON on a specified path.

I have used the following code to test XML value method:

 set @dt1 = SYSDATETIME()
set @s = @xml.value('(/Root/ProductDescription/ProductName)[1]', 'nvarchar(4000)')
set @dt2 = SYSDATETIME()
set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)

And for JSON I have the similar code:

 set @dt1 = SYSDATETIME()
set @s = JSON_VALUE(@json, N'$.Root.ProductDescription.ProductName')
set @dt2 = SYSDATETIME()
set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)

I’m running 10000 iterations and getting the average spent time.

Average time to parse JSON and get the value using JSON_VALUE() function is around 3 microseconds while the equivalent action with XML typed variable and value() method takes between 30 and 40 microseconds.

This shows that parsing JSON “plain text” is 10x faster than parsing strongly typed XML variable.

Code

The code used in this experiment is shown below so you can try it on your server. I have used SQL Server 2017 Express edition in this experiment, and you might get different result in your environment.

XML Test

 

 DECLARE @dt1 datetime2(7), @dt2 datetime2(7)

declare @spentTime float = 0

declare @i int = 0

declare @s nvarchar(100)

while(@i<100000)
begin
DECLARE @xml as xml = N'<?xml version="1.0"?><Root><ProductDescription><Features><Maintenance>3 year parts and labor extended maintenance is available</Maintenance><Warranty>1 year parts and labor</Warranty></Features><ProductID>1</ProductID><ProductName>Road Bike</ProductName></ProductDescription></Root>'

-- Start
set @dt1 = SYSDATETIME()
set @s = @xml.value('(/Root/ProductDescription/ProductName)[1]', 'nvarchar(4000)')
set @dt2 = SYSDATETIME()
set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)

set @i = @i+1

end;

SELECT 'Execution time is ' + CAST( (@spentTime / 100000 ) as nvarchar(100) ) + ' nano-seconds per call'

JSON Test

 DECLARE @dt1 datetime2(7), @dt2 datetime2(7)
 declare @spentTime float = 0
 declare @i int = 0
 declare @s nvarchar(100)

 while(@i<100000)
 begin

 DECLARE @json AS NVARCHAR(200) = '{"Root":{"ProductDescription":{"ProductID":"1","ProductName": "Road Bike","Features": {"Warranty": "1 year parts and labor","Maintenance": "3 year parts and labor extended maintenance is available"}}}}'

 -- Start
 set @dt1 = SYSDATETIME()
 set @s = JSON_VALUE(@json, N'$.Root.ProductDescription.ProductName')
 set @dt2 = SYSDATETIME()
 set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)

 set @i = @i+1

 end;

 SELECT 'Execution time is ' + CAST( (@spentTime / 100000) as nvarchar(100) ) + ' nano-seconds per call'

Comments

  • Anonymous
    November 13, 2017
    Hi, thanks for the analysis, it's possible to improve the xml performance by extracting the text() node:I also reduced the data type to match the variable to avoid unnecessary truncation cycles.set @s = @xml.value('(/Root/ProductDescription/ProductName/text())[1]', 'nvarchar(100)')On my fairly pathetic desktop, this reduces the xml time by 50% +
    • Anonymous
      November 13, 2017
      That's good idea, thanks.I will update example and re-publish results.
    • Anonymous
      February 07, 2018
      Mi Mister Magoo, you might want to read this: https://stackoverflow.com/a/43495014/5089204, especially section "UPDATE 2".
  • Anonymous
    November 13, 2017
    Certainly a good opportunity to improve the XML parser, right?
    • Anonymous
      November 13, 2017
      Hi, you are right, but this would be re-design and not just improvement. XML parser is based on msxml library which is more powerful and complex. JSON parser is actually the same parser that parses T-SQL queries. We have just placed 20-lines of JSON grammar into the parser generator and got highly optimized JSON parser (because T-SQL parser generator is already very optimized so it can handle compilation of complex T-SQL queries). To do the same for XML we would need to completely remove the existing parser manually write XML grammar (which is more complex) and follow the JSON approach which might introduce some regressions.
  • Anonymous
    November 14, 2017
    When can we look forward to query plans being stored as JSON?
    • Anonymous
      November 14, 2017
      This is interesting idea, but it is unlikely to happen in the near future. We might keep both formats, to avoid changing all tools that already work with XML or use XSLT to transform plans. If it is just to view it, we might consider to add some XSL transformation that transform XML to JSON, but I'm not sure how big would be the benefit.It would be faster if you need to scan all plans and filter them by some field, but if there are a lot of plans maybe it would be better to store them in columnstore index (especially now when CCI supports NVARCHAR(MAX) so it can contain LOBs).If you can create some example that shows that plans stored as hypothetical JSON would be faster, and that the absence of XPath is not a problem, this might be interesting idea.
  • Anonymous
    February 07, 2018
    Hi Jovan, I've posted this before, but my comment was swallowed somehow... This "experiment" is simplified to undue circumstances. You might read this https://stackoverflow.com/a/43495014/5089204, especially section "UPDATE 2" and change this click-bait-story to something more realistic...