Поделиться через


JSON Support in SQL Server 2016

JSON Support in SQL Server 2016

JSON support in SQL server is one of the most highly ranked requests with more than 1000 votes on the Microsoft connect site. We have announced that SQL Server 2016 will have a built-in JSON support. Note that this will not be the exact copy of native XML support that exists in SQL Server since 2005. Our goal is to create simpler but still useful framework for processing JSON documents. In this post I will give a brief overview of the JSON features that are planned for SQL Server 2016. Some rough timelines when these features will be available is:

  • SQL Server 2016 CTP2 - ability to format and export data as JSON string - you can find more information about this feature in a separate post.
  • SQL server 2016 CTP3 - ability to load JSON text in tables, extract values from JSON text, index properties in JSON text stored in columns, etc. 

JSON representation

First thing we should be aware is that built-in JSON support is not the same as the native JSON type. In SQL Server 2016, JSON will be represented as NVARCHAR type. The reasons are:

  • Migration - We found that people already store JSON as text, so they would need to change database schema and reload data to use our new features if we introduce a separate JSON type. In our implementation, developers can use JSON functionalities without any change in database.
  • Cross feature compatibility – NVARCHAR is supported in all SQL Server components, so JSON will also be supported everywhere. You can put JSON in Hekaton, temporal or column store tables, apply standard security policies including row level security, use standard B-Tree and FTS indexes, use JSON as a parameter or return value of procedures and UDFs, etc. You don’t need to think does JSON works with feature X – if NVARCHAR works with feature X then JSON will also work. Again there are some constraints – Hekaton and column store do not support LOB values so you can put only small JSON documents there. However, once we add LOB support in Hekaton and column store you will be able to store large JSON documents everywhere.
  • Client-side support – Currently we don’t have standardized JSON object type in client apps (something like XmlDom object). Web and mobile application, and JavaScript clients will naturally use JSON text and parse it with their native parsers. In JavaScript, a type that represents JSON is object. It is unlikely that they will implement some proxy of JSON type that exists only in a few RDBMS. In C#.Net, a lot of developers use JSON.Net parser with built in JObject or JArray type; however it is not a standard and probably it will not be part of ADO.NET. Even in that case we believe that C# apps will accept plain strings from database layer and parse it with their favorite parser. We are not talking only about apps. If you try to use JSON column in SSIS/SSRS, Tableau, Informatica ETL, they will still see it as text. We believe that even if we add a JSON type it will be represented as string outside SQL Server and parsed with some custom parser if needed. Therefore, we have not found any major reason to implement it as native JSON type.

Note that you can still use your own JSON type that can be implemented using CLR by importing JSON.NET library or something similar. Even if you don't like to code CLR UDTs you can find and download a lot of ready to use implementations, and you will not notice the difference between the native JSON type and JSON UDT. If this is fast enough in most of the .Net apps it might be fine for processing in SQL Server. If you believe that JSONB format from PostgreSQL or some compressed format like zipped JSON text is better option, you can parse JSON text in UDT, store it as JSONB in some binary property of CLR UTD, and create member methods that can use properties from that format. Currently we have not found that anyone even tried to create CLR UDT that encapsulate JSONB format, so we will not have that kind of experiments in this version.

Our focus will be on providing good functionalities and query optimization and not on storage. We know that PostgreSQL has a native type and JSONB support, but we still don't know is this faster or better than CLR alternatives so in this version we want to focus on the other things that are more important (do you want to see SQL Server with native type but without built-in functions that handle JSON – I don’t think so :) ). However, we are open for suggestions and if you believe the native type that can replace CLR JSON or plain text will help, you can create request on connect site so we can discuss it there. Again, our choice to start with FOR JSON and OPENJSON is the fact that these functionalities are requested in JSON connect item and probably only things that cannot be easily implemented with CLR.

So, our focus is on the export/import and some built-in functions for JSON processing. Someone might say - this will not be fast enough but we will see. We will talk about performance once we ship functionalities and improve performance if needed. However, be aware that built-in JSON parser is the fastest way to process JSON in database layer. You might use CLR type or CLR parsers as external assemblies but this will not be better than the native code that parses JSON.

Let’s see a short overview of these features in the following sections.

Exporting data as JSON - FOR JSON

First feature that will be available in SQL Server 2016 CTP2 is ability to format query results as JSON text using FOR JSON clause. If you are familiar with FOR XML clause you will easily understand FOR JSON:

 SELECT column, expression, column as alias
 FROM table1, table2, table3
 FOR JSON [AUTO | PATH]

When you add FOR JSON clause at the end of T-SQL SELECT query, SQL Server will take the results, format them as JSON text, and return it to client. Every row will be formatted as one JSON object, values in cells will be generated as values of JSON objects, and column names or aliases will be used as key names. We have two kinds of FOR JSON clause:

  • FOR JSON PATH that enables you to define the structure of output JSON using the column names/aliases. If you put dot-separated names in the column aliases, JSON properties will follow the naming convention.
    This feature is similar to FOR XML PATH where you can use slash separated paths.
  • FOR JSON AUTO that automatically create nested JSON sub arrays based on the table hierarchy used in the query, again similar to FOR XML AUTO.

If you have used PostgreSQL JSON functions and operators you will notice that FOR JSON clause is equivalent to JSON creation functions such as row_to_json, or json_object. The main purpose of FOR JSON is to create JSON based on variables and columns and format it according to the JSON specification, e.g.:

SET @json = ( SELECT 1 as firstKey, getdate() as dateKey, @someVar as thirdKey FOR JSON PATH)

-- Result is: { "firstKey": 1, "dateKey": "2016-06-15 11:35:21", "thirdKey" : "Content of variable" }

FOR JSON is a good choice if you want to:

  • Serialize array of objects that should be returned to the client. Imagine that you are creating JSON web service that returns suppliers and their products (e.g. using $extend option in OData services). You can search for suppliers, format each of them as JSON text and then send a separate query to get products for this supplier that will be formatted as JSON array and attached to the main supplier object. Other solution might be to join suppliers and products and format results on client side (EF will probably send separate queries). With FOR JSON clause you can join these two tables, add any predicate you want and formatting will be done in the database layer.
  • You want to de-normalize parent-child tables and put all child records as one value of parent cell formatted as JSON array. As an example you don't want separate tables for SalesOrderHeader and SalesOrderDetails - you can format order details for each header as JSON array that can be stored in single cell in SalesOrderHeader table. 

Transform JSON text to relational table - OPENJSON

OPENJSON is table-value function (TVF) that seeks into some JSON text, locate an array of JSON objects, iterate through the elements of array, and for each element generates one row in the output result. This feature will be available in CTP3. JSON text that is generated with FOR JSON clause can be transformed back to the relational form using OPENJSON. We will have the following types of OPENJSON TVF:

  • OPENJSON with predefined result schema that will enable you to define schema of the table that will be returned, as well as mapping rules that will specify what properties will be mapped to the returned columns.
  • OPENJSON without return schema where result of the TVF will be set of key-value pairs.

One example of OPENJSON function in T-SQL query is shown in the following example:

 SELECT Number, Customer, Date, Quantity
 FROM OPENJSON (@JSalestOrderDetails, '$.OrdersArray')
 WITH (
        Number varchar(200), 
        Date datetime,
        Customer varchar(200),
        Quantity int
 ) AS OrdersArray

 @JSalesOrderDetails is a text variable that contains an array of JSON objects in the property OrdersArray as it is shown in the following example:

'{"OrdersArray": [
   {"Number":1, "Date": "8/10/2012", "Customer": "Adventure works", "Quantity": 1200},
   {"Number":4, "Date": "5/11/2012", "Customer": "Adventure works", "Quantity": 100},
   {"Number":6, "Date": "1/3/2012", "Customer": "Adventure works", "Quantity": 250},
   {"Number":8, "Date": "12/7/2012", "Customer": "Adventure works", "Quantity": 2200}
]}'

OPENJSON will find an array in this property and return one row for each JSON object (i.e. element of the array). Four columns in the result set are defined in WITH clause. OPENJSON will try to find properties Number, Date, Customer, and Quantity in each JSON object and convert their values into columns in result set. By default, NULL will be returned is property is not found. This is the simplest example, but OPENJSON will be available to parse different formats of JSON text (e.g. nested JSON objects).

These functionalities are similar to the current OPENXML function – even the syntax is almost identical. The most important difference is the fact that OPENXML works with xml handles as an argument, and OPENJSON will directly work with JSON text.

If you are familiar with PostgreSQL JSON functions you will notice that OPENJSON is similar to json_each, json_object_keys, json_populate_record, and json_populate_recordset functions. the most important different is that we will have only two types of OPENJSON function that do the same thing as PostgerSQL.

When you will use OPENJSON?

Imagine that you are importing JSON documents in database and you want to load them in a table. Instead of parsing JSON at the client side and streaming set of columns to table, you can send JSON as-is, and parse it in database layer.

Another use case is analytic over JSON text. Since we will not introduce yet another language for JSON analysis, if you are storing JSON as text, you can transform it to a temp table and apply standard T-SQL analytic functions and operators.  

Built-in functions for JSON processing

We will also provide set of useful functions for parsing and processing JSON text. JSON built-in functions that will be available in SQL Server 2016 CTP3 are:

  • ISJSON( jsonText ) that checks is the NVARCHAR text input properly formatted according to the JSON specification. You can use this function the create check constraints on NVARCHAR columns that contain JSON text,
  • JSON_VALUE( jsonText, path ) that parses jsonText and extracts scalar value on the specified JavaScript-like path (see below some JSON path examples),

So what is the syntax of the path that will be used in built-in functions? We are using some kind of JavaScript-like syntax for referencing properties in JSON text. Some examples are:

  • '$' - references entire JSON object in the input text,
  • '$.property1' – references property1 in JSON object,
  • '$[5]' – references 5-th element in JSON array,
  • '$.property1.property2.array1[5].property3.array2[15].property4' – references complex nested property in the JSON object.

Dollar sign ($) represents the input JSON object (similar to starting / in XPath). You can add any JavaScript like property/array references after the context item to reference any nested property. One simple example of query where these built-in functions are used is: 

 SELECT t.Id, t.OrderNumber, t.OrderDate,
 
 JSON_VALUE(t.JOrderDetails, '$.Order.ShipDate')
 
FROM SalesOrderRecord AS t
 
WHERE ISJSON(t.JOrderDetails) > 0
 
 AND JSON_VALUE(t.JOrderDetails, '$.Order.Type') = 'C'
 

Again if we compare this with PostgreSQL you will notice that JSON_VALUE is equivalent to json_extract_path_text, ->>, or #> operators.

How to index JSON documents?

Currently we are not planning to add any custom JSON indexes. Since JSON is plain text you can use existing B-tree or full text search indexes for some advanced search query. As an example you can create computed
column using JSON_VALUE function and create standard index on that column, as it is shown in the following example:

 CREATE TABLE SalesOrderRecord (
 
 Id int PRIMARY KEY IDENTITY,
 
 OrderNumber NVARCHAR(25) NOT NULL,
 
 OrderDate DATETIME NOT NULL,
 
 JOrderDetails NVARCHAR(4000),
 
 Quantity AS CAST(JSON_VALUE(JOrderDetails, '$.Order.Qty') AS int),
 
 Price AS JSON_VALUE(JOrderDetails, '$.Order.Price'),
 
 )
 
 GO
 
 CREATE INDEX idxJson
 
 ON SalesOrderRecord(Quantity)
 
 INCLUDE (Price);
 

Index will be used for any seek operation if query optimizer notices that the JSON_VALUE function is used in the query with the identical <column, path> pair. We will see in practice is this sufficient enough or we would need to implement some custom index.

So this is a short overview of new JSON functionalities. We will see more details in separate posts.

Comments

  • Anonymous
    May 17, 2015
    Awesome stuff! Can't wait for the preview bits and be able to try it out on some of our applications!

  • Anonymous
    May 18, 2015
    The comment has been removed

  • Anonymous
    May 18, 2015
    JSON for tabular data is just stupid - at least in the obvious way you represent it. In the sample json above, the strings "Number", "Date", "Customer" and "Quantity" are sent and parsed 4 times each (and let's be honest - quite often it's way more than 4 rows we return). It's a waste of data transfer and cpu cycles for parsing. It would be better to have a header with the columns (which allows including datatype), and an array of values. Example: {"OrdersArray": {  Schema: { "Number":"Int32","Date":"Date","Customer":"String","Quantity":"Int32"} },  Rows: [ [1,8/10/2012,"Adventure works",1200], [1,5/11/2012,"Adventure works",100], [1,1/3/2012,"Adventure works",250], [1,12/7/2012,"Adventure works",2200] ] }

  • Anonymous
    May 18, 2015
    The comment has been removed

  • Anonymous
    May 18, 2015
    @Philip We will have functions that matches Postgre operators e.g.: JSON_VALUE can be used instead of -> operator, FOR JSON instead of row_to_json or array_to_json, OPENJSON instead of json_each, json_object_keys, etc. The most common functions for processing JSON will be there - we are just using more T-SQL/XML like syntax. Postgre has a native type and internal JSONB format but this is a storage option that is not related to functionalities. We can create type later if we see that is necessary. They do not use some dedicated JSON index other than standard expression based indexes and gin indexes on JSON properties (we will also be able to use standard B-tree indexes and FTS). So what you see as the major missing feature?

  • Anonymous
    May 18, 2015
    The comment has been removed

  • Anonymous
    May 18, 2015
    I'm proud to be one of the few downvoters on that Connect item. Nested tables all the things.

  • Anonymous
    May 18, 2015
    Do you have a typo here: SET @json = ( SELECT 1 as firstKey, getdate() as dateKey, @someVar as dataKey FOR JSON PATH) -- Result is: { "firstKey": 1, "dateKey": "2016-06-15 11:35:21", "someVar" : "Content of variable" } Shouldn't it be -- Result is: { "firstKey": 1, "dateKey": "2016-06-15 11:35:21", "dataKey" : "Content of variable" }

  • Anonymous
    May 19, 2015
    Thanks @Victor, I have corrected example (now it is thirdKey)

  • Anonymous
    May 19, 2015
    You're missing the point of when JSON would be used with a relational database. The reason why someone is storing JSON in an nvarchar(max) right now is because the JSON cannot be represented in a relational database because it's Jagged. That is the structure is generally known, but it can be different with every result. Consider a user defined search that the user can specify the fields that come back and the structure of the returning data. In this case you won't know the structure nor which columns will be there, and every single set of results from that search will be different in structure. The point to supporting Json is to provide a function that gets the data in directly, allows for indexing to be applied pro-grammatically and then be able to search and select on that structure using dynamic style querying (i.e. if you were using linq, you'd cast the result as dynamic and then blindly write the Linq against the possible values that may be pragmatically known using Expression Trees.) Yes, there are cases where you need to export known relational data and do the same back in, but that is not the case where this will be used 99% of the time. You really need to rethink this and get serious about variable structure data. Whether it be xml or json or some other random format.

  • Anonymous
    May 19, 2015
    I am in full agreement with James Hancock; please make sure it works well with code that does not assume that every row in the table has an identical JSON structure. A very likely scenario (in that it's one I use today) is a tagged union-like construct, with a discriminator column for "type of row" and different with commonalities or entirely different JSON structures in another column.

  • Anonymous
    May 19, 2015
    I agree with James Hancock and Ernst-Georg Schmid. I hope Microsoft not just making a half-baked solution. The argument that current JSON data is stored as a text string in sql server is exactly the reason that you need to create a native JSON type. To me, the biggest concern is performance. Does anyone believe with tens or hundreds of millions of records in a table with JSON data and without index, the performance can be usable? It is sad to see many MS product decision maker (esp. in SQL Server field) keeps on making dumb decisions, including releasing sql server every 2 years, yet, with few eye-catching and also mature features.

  • Anonymous
    May 19, 2015
    The comment has been removed

  • Anonymous
    May 20, 2015
    It amazes me how some people post without any consideration of manners and civility.  My experience is that little developer guys are much more polite when they have to talk to a man face to face.  Curious.

  • Anonymous
    May 21, 2015
    I think a native type will be useful to allow the ability modifying existing documents without reinserting the entire document. Postgres 9.5 has this feature: wiki.postgresql.org/.../What%27s_new_in_PostgreSQL_9.5 Xml doesn't allow patching which is highly disappointing. In theory, I could scan the document to find where the node begins and then use the .write feature if I'm storing the data in a Nvarchar(max) field, but that means I'm parsing the structure multiple times. The GIN indexes they support also index individual attributes, where as the b+ nc would require you to create derived persisted columns to gain the same benefit. Ultimately, until there is a type, I fear that the only use of this feature will be to make it slightly easier to pass a list of ids from a client app to a storeproc. (Use OPENJSON to parse it rather than a string splitting TVF). For that matter, I think I'd rather have a string splitting TVF over half baked json. And I'd also like a  string concat aggregate. In postgres, with array and jsonb support, it's now very reasonable to avoid the common EAV pattern and replace it with a single array or jsonb column. In Sql server, for a small number of values, I can get away with storing it in a varchar/nvarchar field, and for large values where I care about performance of retrieval I have to use EAV.

  • Anonymous
    May 27, 2015
    Nice feature by Microsoft .

  • Anonymous
    May 31, 2015
    For me, the storage aspect doesn't matter that much and I accept your reasoning about using nvarchar. You say that your "focus will be on providing good functionalities and query optimization and not on storage." Does it mean that there will be support for indexing array values in JSON? For example, in the PostgreSQL example of the JSON data of a record containing "tags": [ "enim", "aliquip", "qui" ]. The ability to have the "tags" field indexed in a way that enables fast queries with WHERE conditions such as "WHERE "tags" HAS "qui"" and "WHERE NOT( "tags" HAS "qui" )" would be the key to avoiding many of the cases where the data needs to be distributed to multiple tables for no other reason than the fact that a field is multi-valued. I don't see how I could do that with a computed column and JSON_VALUE.

  • Anonymous
    June 02, 2015
    You say "JSON support in SQL server is one of the most highly ranked requests" yet your not going to "do your best" and support it all the way? We would be saving xml as text also if an xml type did not exist so your reasoning to use the "NVARCHAR type" for JSON because "We found that people already store JSON as text" is beyond reason.

  • Anonymous
    June 17, 2015
    The comment has been removed

  • Anonymous
    June 20, 2015
    The comment has been removed

  • Anonymous
    July 01, 2015
    Jovan, Thanks for taking a look.  Please reach out at info@json4sql.com.  I'd like to discuss some different benchmarking scenarios and perhaps we could collaborate. I agree plain text will be faster if you simply want to pull data out and push data in without actually getting values out of the data or performing any operations on the data.  However, this is not usually the case and definitely not as useful as the other JSON storage solutions on the market. Some advantages you have working outside of the CLR that would make JSON4SQL much faster is:

  1. The ability to work with unmanaged code so you can use memcmp, memcpy, pointers, faster casting of the binary elements of the JSONB structure directly to SQL data types, etc.
  2. The ability to create your own indexing mechanism (something the CLR doesn't allow me access to).  This makes Postgres's solution very attractice and performant.  I would love to have the ability to replicate their indexing capabilities in SQL Server or at least the capabilities that already exist for XML. I would be happy to discuss further.
  • Anonymous
    July 08, 2015
    We decided not to go for http://www.json4sql.com/ because of potentially not being able to scale due to lack of indexing - of course what they have achieved via de CLR is impresive - and looks like what is coming in SQL Server 2016 is going to be even worse... Dear MSSQL team, what we need is a mix of RDBML with NOSQL by means of a powerful document based storage data type. Was expecting all the power we now have with XML - or even more - but with JSON. :(

  • Anonymous
    July 20, 2015
    Jovan, Thanks for this post. To be honest, I'm surprised JSON will not be implemented as data type (I was expecting something like jsonb in PostgreSQL). It sounds to me like if you said something like "Store date/time values in nvarchar column, put check constraint with ISDATE function on that column and we will provide functionalities for working with date/time for you." :-) IMHO, it is not a good approach. If I want to store some kind of values in a column, setting appropriate data type is the most natural way to constraint possible values and to declare the meaning of the data. JSON is not "any text". Different kinds of operations are needed for regular text and for JSON. Is all speaks for separate data type. Isn't this good enough argument for implementing JSON as data type? Or is something wrong with this point of view? :-) Thanks, Petr

  • Anonymous
    July 24, 2015
    There is a recently launched web tool that allows you to run SQL on JSON feeds (such as APIs) - http://sqall.co - I guess it's a bit different to what SQL Server will offer.

  • Anonymous
    August 04, 2015
    It is true that we have not created a type/index, but if we find that current JSON implementation is the bottleneck in your workloads we will optimize it. We would like to know how you would use JSON in order to find out what improvements we might add. Feel free to take CTP3, import your JSON documents, use current built-in functions and indexes, and post your performance results. As far as we see, most of the JSON operations are simple - seek into JSON text, find some property on a json path and take the value that will be used to filter or order results. If you index properties using regular indexes it should work fine. We don't have workloads where someone will use XQuery-like functions, filter results by all properties in JSON docs, etc. Please describe how you will use JSON so we can decide how to proceed based on real data. Jovan

  • Anonymous
    August 22, 2015
    For those of you interested in richer JSON storage and search capabilities, check out Microsoft Azure DocumentDB. Cheers, Trevor Sullivan Microsoft MVP: PowerShell http://trevorsullivan.net http://twitter.com/pcgeek86

  • Anonymous
    August 30, 2015
    Does native JSON support extend to SQL Azure v12 too?

  • Anonymous
    September 02, 2015
    It will be available in Azure SQL Database as soon as we complete OPENJSON and other built-in functions.

  • Anonymous
    September 03, 2015
    Unfortunately as far as I can see, the new CTP 2.3 is not supporting JSON to SQL functions Do we have to wait CTP3 for that?

  • Anonymous
    September 10, 2015
    Introduction to JSON in sql server www.youtube.com/watch

  • Anonymous
    September 24, 2015
    Long waited feature finally added. Incase some wants to process JSON Data and load to SQL Server using SSIS way here is the custom Adapter zappysys.com/.../ssis-json-file-source

  • Anonymous
    September 27, 2015
    One of the key structures in JSON is an ordered list of values. In my use case, documents stored in a content management system, I sometimes run in to the use case that I want to filter those records where one element in the list has a certain value. I would like to now how that could be done with the JSON support as it is envisioned by Microsoft. Standard B-Tree indixes do not support multiple values per row and Full Text Indices are not transactionally safe.

  • Anonymous
    November 04, 2015
    This is half-baked JSON solution because it's not native and you cannot index JSON fields.. You can't be serious about this. Storing JSON as nvarchar, seriously?

  • Anonymous
    November 09, 2015
    Hi @Stan, In this version you can index JSON properties (one or more properties that you specify in index definitions) using standard indexes to optimize filter and sort operations. This is similar to MongoDB single and compound indexes. You can also index elements in JSON arrays using FTS which will improve performance of CONTAINS queries. This is similar to MongoDb multi-key index. I agree that there is no ability to automatically index all possible properties in JSON; however, in SQL Server (and most of the other RDBMS) you cannot automatically index all possible columns in tables, because generally indexes on all properties slow down updates. If we create some index that automatically index everything without any cost during updates we would apply it on tables too. The current solution is built for developers who know what are the most frequently used paths and they can optimize these workloads using current indexes. Feel free to install SQL Server and other providers that have JSON support and try to compare performance on the identical workloads. I would be happy to see results if you publish it on some blog, and we will work on improvements based on your data. Thanks, Jovan