JSON parsing - performance comparison
One of the first questions that people asked once we announced JSON support in SQL Server 2016 was "Would it be slow?" and "How fast you can parse JSON text?". In this post, I will compare performance of JSON parsing with JSON_VALUE function with the XML and string functions.
Experiment
In this experiment I have compared JSON_VALUE with xml.value, and few string functions. In the experiment I have repeated the same function call 100000 times and calculated average execution time:
DECLARE @dt1 datetime2, @dt2 datetime2
declare @spentTime float = 0
declare @i int = 0
declare @s nvarchar(100)
while(@i<100000)
begin
DECLARE @json AS NVARCHAR(200) = '{"Root":{"ProductDescription":{"ProductID":"1","ProductName": "Road Bike","Features": {"Warranty": "1 year parts and labor","Maintenance": "3 year parts and labor extended maintenance is available"}}}}'
-- Start
set @dt1 = getdate()
set @s = JSON_VALUE(@json, N'$.Root.ProductDescription.ProductName')
-- set @s = CHARINDEX('ProductName', @json)
-- set @s = PATINDEX('%ProductName%', @json)
-- set @s = SUBSTRING(@json, 10, 10)
-- set @s = SUBSTRING(@json, CHARINDEX('ProductName', @json)+15, CHARINDEX('"', @json, CHARINDEX('ProductName', @json)+15) - (CHARINDEX('ProductName', @json)+15))
set @dt2 = getdate()
set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)
set @i = @i+1
end;
SELECT 'Execution time is ' + CAST( (@spentTime / 100000 / 1000) as nvarchar(100) ) + ' micro-seconds per call'
I have created one JSON string and applied functions JSON_VALUE, CHARINDEX, PATINDEX and SUBSTRING in 100000 iterations. Total time is calculated as a sum of duration of individual function calls. In each experiment I have uncommented function that I want to test and commented-out other ones.
I have also added one more complex string expression:
SUBSTRING(@json, CHARINDEX('ProductName', @json)+15, CHARINDEX('"', @json, CHARINDEX('ProductName', @json)+15) - (CHARINDEX('ProductName', @json)+15))
This expression is hard-coded simulation of JSON_VALUE function. I'm trying to extract substring between "ProductName" key, skip three characters after this key (i.e. ":" sequence) and find the rest of the text until the double quote. This expression will return the same result as JSON_VALUE function although it will not be context sensitive and it would not work in more complex JSONs because it cannot be aware is ProductName actually key or some random text. However, it is good enough for performance comparison.
To test XML value I will use slightly modified input that uses XML instead of JSON:
DECLARE @dt1 datetime2, @dt2 datetime2
declare @spentTime float = 0
declare @i int = 0
declare @s nvarchar(100)
while(@i<100000)
begin
DECLARE @xml as xml = N'<?xml version="1.0"?>
<Root>
<ProductDescription>
<Features>
<Maintenance>3 year parts and labor extended maintenance is available</Maintenance>
<Warranty>1 year parts and labor</Warranty>
</Features>
<ProductID>1</ProductID>
<ProductName>Road Bike</ProductName>
</ProductDescription>
</Root>'
-- Start
set @dt1 = getdate()
set @s = @xml.value('(/Root/ProductDescription/ProductName)[1]', 'nvarchar(400)')
set @dt2 = getdate()
set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)
set @i = @i+1
end;
SELECT 'Execution time is ' + CAST( (@spentTime / 100000 / 1000) as nvarchar(100) ) + ' micro-seconds per call'
Here I have just converted JSON into equivalent XML and used XPath expression instead of JsonPath.
JSON vs Spatial
In last experiment let's compare it with Spatial Point. In this example I will create one point with (3,8) coordinate and equivalent JSON array. In the code below I'm comparing performance of function that reads coordinate X of point:
DECLARE @dt1 datetime2, @dt2 datetime2
declare @spentTime float = 0
declare @i int = 0
declare @s nvarchar(100)
declare @f float
while(@i<100000)
begin
DECLARE @g geometry = geometry::STGeomFromText('POINT(3 8)', 0),
@json nvarchar(100) = '[3,8]'
-- Start
set @dt1 = getdate()
set @f = @g.STX
--set @f = CAST(JSON_VALUE(@json, '$[0]') as float)
set @dt2 = getdate()
set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)
set @i = @i+1
end;
SELECT 'Execution time is ' + CAST( (@spentTime / 100000 / 1000) as nvarchar(100) ) + ' micro-seconds per call'
Results on my machine are 19-23 microseconds per call for Spatial, and 9-11 microseconds per call for JSON_VALUE.
Results
In the following table are listed results of tests:
function | time (microseconds) |
@xml.value | 30-50 |
JSON_VALUE | 11-13 |
PATINDEX | 7-9 |
CHARINDEX | 7-9 |
SUBSTRING | 1-2 |
SUBSTRING+CHARINDEX | 18-22 |
The interesting thing we can notice is that parsing JSON text is 3-5x faster than parsing equivalent XML, and twice faster than parsing equivalent spatial point.
JSON_VALUE is x2 slower than PATINDEX and CHARINDEX, which is expected since these functions have much simpler parsing logic. However, if we are aware that JSON_VALUE use more complex parsing rules that are context aware (JSON_VALUE need to remember paths of parent objects) and conversion of escaped characters, this might not be too bad.
Direct access via position in SUBSTRING is the 10x faster than JSON_VALUE and 5-6x faster than PATINDEX/CHARINDEX, which is also expected.
The interesting result is simulation of JSON_VALUE via SUBSTRING and CHARINDEX functions. Although these functions use simple matches they are slower than parser used by JSON_VALUE. The reasons might be the facts the we need to scan JSON text three time to get the char indexes and also the fact that JSON parser use exact character matches and string function use collation aware rules for matching.
This test is executed on SQL Server 2016 RC0 you can copy paste examples and run them on your own instances to check what are the ratios in your environments.
Comments
Anonymous
January 13, 2016
It would also be interesting to compare a .net compiled stored proc inside SQL Server. Also, use datetime2, instead of datetime. datetime only has a 3 ms precision, and you are doing a nanosecond datediff.Anonymous
April 24, 2017
I'd like to see a performance comparison for a large json object. Also how does this compare to a mongodb or documentdb alternative -particularly with large json objects?