Combining relational and NoSQL concepts in SQL Server

[アーティクル]
09/01/2015

Intro

In this article we will see how you can combine NoSQL and relational concepts in SQL Server database.

We will see how you can transform tables to JSON objects and vice versa using FOR JSON and OPENJSON functions.

Relational vs. NoSQL approach

Traditional relational schema design follow strict normalization approach – every logical entity is stored as a separate table and there are foreign key relationships between related entities. Imagine products that may have many reviews – you should create one table for products, another for reviews, propagate foreign key from product into reviews table, and add a foreign key relationship between them, as shown in the following figure:

Although this is a proper database design, if you have many related entities you might end-up with a lot of tables and foreign key relationships, you will need to join tables every time you need to take product information, maintain indexes or foreign key columns to speed-up joins, etc. Example of query that reads products and related reviews is shown in following example:

 SELECT Production.Product.ProductID AS ID, Production.Product.Name, 
 Color, ListPrice, ReviewerName, ReviewDate, Rating FROM Production.Product JOIN Production.ProductReview ON Production.ProductReview.ProductID = Production.Product.ProductID

Every time you need to retrieve related reviews you need to join tables and you would probably need to create additional index on foreign key column in Review table. Results are shown in the following table:

ID	Name	Color	ListPrice	ReviewerName	ReviewDate	Rating
709	Mountain Bike Socks, M	White	9.50	John Smith	2007-10-20 00:00:00.000	5
937	HL Mountain Pedal	Silver/Black	80.99	David	2007-12-15 00:00:00.000	4
937	HL Mountain Pedal	Silver/Black	80.99	Jill	2007-12-17 00:00:00.000	2
798	Road-550-W Yellow, 40	Yellow	1120.49	Laura Norman	2007-12-17 00:00:00.000	5

On the client-side you probably need to process results and merge two 937 product rows as a single object with two related reviews items (e.g. as an array with two elements). Instead of single query that joins two tables you can use two separate queries. First query would read product information ad second query would read related product reviews. Some ORM tools use this approach when they need to read related informaiton. This is not so bad unless if you have a lot of related entities (e.g. product images, product attachments, product categories, etc.) and you need to run several independent queries for a single product.

One of the reasons why NoSQL systems become popular is the fact that you can use composite objects where you can store attributes of primary entity (product in our example) with related records (product reviews) within the primary entity as an array or collection of sub-objects. As an example, in MongoDb or DocumentDb you will create one JSON document for Product and add related reviews as an array of JSON objects like in the following example:

Products collection
{"ProductID":709,"Name":"Mountain Bike Socks, M","Color":"White","Reviews":[{"Reviewer":{"Name":"John Smith","Email":"john@fourthcoffee.com"},"ReviewDate":"2007-10-20T00:00:00","Rating":5,"ModifiedDate":"2007-10-20T00:00:00"}]}
{"ProductID":798,"Name":"Road-550-W Yellow, 40","Color":"Yellow","Reviews":[{"Reviewer":{"Name":"Laura Norman","Email":"laura@treyresearch.net"},"ReviewDate":"2007-12-17T00:00:00","Rating":5,"ModifiedDate":"2007-12-17T00:00:00"}]}
{"ProductID":937,"Name":"HL Mountain Pedal","Color":"Silver\/Black","Reviews":[{"Reviewer":{"Name":"David","Email":"david@graphicdesigninstitute.com"},"ReviewDate":"2007-12-15T00:00:00","Rating":4,"ModifiedDate":"2007-12-15T00:00:00"},{"Reviewer":{"Name":"Jill","Email":"jill@margiestravel.com"},"ReviewDate":"2007-12-17T00:00:00","Rating":2,"ModifiedDate":"2007-12-17T00:00:00"}]}

You have simpler data model, no JOINs, no additional requests/queries/indexes, all data available in the same record. However, this model is also far from perfect. Although it is a good choice for smaller data models, in more complex models you might end-up with heavy objects, or you would need to organize objects in separate collections. In some cases you might need to join objects stored in different collections, and you would find that this is not possible (e.g. in DocumentDb) or you need to write complex map/reduce jobs for a simple join (e.g. in MongoDb).

How to transform relational schema into NoSQL in SQL server?

SQL Server 2016/Azure SQL Database introduce hybrid approach where you can choose between relational and NoSQL concepts. As an example, if you have products and their reviews you don’t need to create separate tables if you don’t want them. You can create additional columns in the primary table that will contain collection of related entities formatted as JSON arrays:

 ALTER TABLE Production.Product
 ADD Reviews NVARCHAR(MAX) 
 CONSTRAINT [Reviews are formatted as JSON] CHECK(ISJSON(Reviews)>0)

In this example, we are adding a simple text column with a constraint that specifies that reviews should be formatted as JSON (similar to NoSQL databases). There is no new syntax for JSON constraint - you can use standard check constraint with function ISJSON that checks is Reviews text formatted as a JSON object.

If we want to move related product reviews from a separate table into this column we can use a simple UPDATE query:

 UPDATE Production.Product
 SET Reviews = (
 SELECT ReviewerName AS [Reviewer.Name],
 EmailAddress AS [Reviewer.Email],
 ReviewDate, Rating, ModifiedDate
 FROM Production.ProductReview
 WHERE Production.ProductReview.ProductID = Production.Product.ProductID
 FOR JSON PATH)

Inner query fetches all related reviews, formats them as JSON documents using FOR JSON clause and stores them as JSON text in Reviews column. We can format properties in JSON document using dot syntax (e.g. Product.Name will be created as a Name property in Product object).

If you don't have existing related table, you can insert new values as array of JSON objects, e.g.:

 INSERT INTO Production.Products(Name, Reviews)
 VALUES('HL Mountain Pedal', 
 '[
 {"Reviewer":{"Name":"John Smith","E-mail":"john@fourthcoffee.com"},"Rating":5},
 {"Reviewer":{"Name":"Mike"},"Rating":4}
 ]')

Values can be inserted as a plain.

Now, we can read products and related reviews with a single query:

 SELECT ProductID, Name, Color, ListPrice, Reviews
 FROM Production.Product
 WHERE Reviews IS NOT NULL

Results are shown in the following table:

ProductID	Name	Color	ListPrice	Reviews
709	Mountain Bike Socks, M	White	9.5	[{"Reviewer":{"Name":"John Smith","E-mail":"john@fourthcoffee.com"},"ReviewDate":"2007-10-20T00:00:00","Rating":5,"ModifiedDate":"2007-10-20T00:00:00"}]
798	Road-550-W Yellow, 40	Yellow	1120.49	[{"Reviewer":{"Name":"Laura Norman","E-mail":"laura@treyresearch.net"},"ReviewDate":"2007-12-17T00:00:00","Rating":5,"ModifiedDate":"2007-12-17T00:00:00"}]
937	HL Mountain Pedal	Silver/Black	80.99	[{"Reviewer":{"Name":"David","E-mail":"david@graphicdesigninstitute.com"},"ReviewDate":"2007-12-15T00:00:00","Rating":4,"ModifiedDate":"2007-12-15T00:00:00"},{"Reviewer":{"Name":"Jill","E-mail":"jill@margiestravel.com"},"ReviewDate":"2007-12-17T00:00:00","Rating":2,"ModifiedDate":"2007-12-17T00:00:00"}]

As you can see, reviews are returned as a collection of JSON objects. There is exactly one row per each product so you do not need any transformation on the client side. This is a perfect choice if your client already expects JSON format for related records (e.g. if you are using some Angular, Knockout or other template engines that inject JSON model into HTML view).

As an alternative if you want to “join” products with related reviews as in the query above, you can use following query:

 SELECT ProductID, Name, Color, ListPrice, ReviewerName, ReviewDate, Rating
 FROM Production.Product
 CROSS APPLY OPENJSON(Reviews)
 WITH ( ReviewerName nvarchar(30) '$.Reviewer.Name', ReviewDate datetime2, Rating int)
WHERE Reviews IS NOT NULL

OPENJSON table value function takes related reviews formatted as JSON and returns them as a table. In the WITH part you can specify schema of returned table. Column names match the names of properties in JSON object, and if you have nested property (e.g. Reviewer.Name) you can specify JSON path where this value can be found. CROSS APPLY joins parent row with table returned by OPENJSON functions. Results are shown in the following table:

ID	Name	Color	ListPrice	ReviewerName	ReviewDate	Rating
709	Mountain Bike Socks, M	White	9.50	John Smith	2007-10-20 00:00:00.000	5
937	HL Mountain Pedal	Silver/Black	80.99	David	2007-12-15 00:00:00.000	4
937	HL Mountain Pedal	Silver/Black	80.99	Jill	2007-12-17 00:00:00.000	2
798	Road-550-W Yellow, 40	Yellow	1120.49	Laura Norman	2007-12-17 00:00:00.000	5

In both cases you don’t need to scan two tables – all necessary results are taken with a single query.

OK, but how to update reviews?

This is important thing you would need to consider. Updating JSON collection of objects is not so efficient as updating collection of rows in a separate table. In NoSQL models, updating fields imply updating one big blob of data where you

SQL Server provides JSON_MODIFY function that enables you to update a value on some path, delete or add new node, or append object at the end of the list of nodes.

In the following procedure is shown how can you append new review into the collection of product reviews:

 CREATE PROCEDURE
 AddProductReview(@productid int, @reviewerName nvarchar(30), @reviewerEmail nvarchar(30), @reviewDate datetime2, @rating int)
 AS
 BEGIN

 UPDATE Production.Product
 SET Reviews = JSON_MODIFY(Reviews, 'append $',
((SELECT @reviewerName AS [Reviewer.Name], @reviewerEmail AS [Reviewer.Email],
         @reviewDate AS ReviewDate, @rating AS Rating, getdate() AS ModifiedDate
  FOR JSON PATH, WITHOUT_ARRAY_WRAPPER))
 )
 WHERE ProductID = @productid;

END

this procedure accepts product id, and information about new review as parameters. You would need to transform review data to JSON using FOR JSON clause, and then use JSON_MODIFY to append the new review to the list of the existing reviews. If you provide review as JSON text to the procedure, you could simply append the input JSON object to the existing array.

Summary

Although SQL Server is relational database you don't need to use strict relational theory in your database design. With new JSON support that is coming in SQL Server 2016 and Azure SQL Database you can choose when to follow strict concepts of relational schema design, and when to format objects as in the NoSQL systems. SQL Server gives you flexibility to choose the best format for your applications and enables you to easily transform table data into JSON documents and JSON documents to table structure.

Comments

Anonymous
September 01, 2015
Looks very promising. What would be the recommended approach to alter the product reviews?

An update query which reads the content of the reviews field and add/update/delete a review?
Keep the ProductReview table and periodically (every X minutes/hours/days) update the Reviews column in the Product table?
Something else?

Anonymous
September 01, 2015
Very good and interesting approach. Helpful article for future deep DB understanding.
Anonymous
September 02, 2015
Jovan , This is helpful. How about creating JSON data type and indexing it? Is this being thought about, coming up in next release? One more question - if you have multiple schema embedded into into JSON text, how does OPENJSON handles it?
Anonymous
September 02, 2015
We had a long discussion about type vs plain text storage. Text is faster to load and read because we don't have transformation from binary to standard JSON format. Plain text representation automatically works with in-memory/column store tables, all external tools/drivers, etc. In several experiments and comparisons with XML type JSON is faster. In many current workloads are used only a few properties from JSON documents and JSON_VALUE/OPENJSON will deserialize only properties required in the query. JSON type would need to deserialize all properties even if they are not used (i.e. when XML type is deserialized from its binary form, DOM elements are created for all properties even if they are not used in the query). Our current streaming parser just seeks into JSON and takes fields that are needed in the queries. We will analyze queries that use JSON and if type is necessary it will be added. Regarding indexing you can use standard non clustered indexes on properties of JSON documents (using computed columns) and full text search indexes. DocumentDb creates automatic indexes on all properties in JSON; however, the latest recommendation from DocumentDb team is to use indexes on some subset of properties because indexing all properties might be overhead. Type and some automatic index will not be added in SQL Server 2016, but it might be added in some future release. Regards, Jovan
Anonymous
September 02, 2015
Hi Sébastien, I have added additional section about updates. Thanks for this comment it is important topic that should be discussed. Regards, Jovan
Anonymous
September 03, 2015
Thanks Jovan, now it's easier to get an idea of the full solution. In fact the "update Reviews JSON everytime" vs "use separate table instead or update Reviews JSON column periodically" will just depend on the reviews update frequency.
Anonymous
September 07, 2015
Very informative. I was wondering about the possible business cases that would require me to store JSON in a RDBMS attribute. Of course it is better to have the capability available than not. I am working on the opposite right now. We are looking at moving less "transactional" data to a platform such as MongoDB.
Anonymous
September 07, 2015
Hi Stephen, Reason why you would store data as JSON in a text column is probably the same as the reason why you are moving data to MongoDb. You have less transactional data, data that will not be changed, and you don't want to use heavy relational model where you need to create separate tables for every many to many relationship or every sub-entity if they are relatively immutable. Also you might want to avoid a lot of JOINs between tables you can store related entities as collections of JSON data. Performance will depend on your workloads but you have ability to choose which of these two concepts works for you and combine them. As an example, imagine that you want to load products with their tags, attachments, categories, etc. In the strict relational model you would need to load products, read assigned primary key (e.g. identity or guid) and then use it as a foreign key to load tags product categories, product attachments, etc. If tags and categories are stored as collection of JSON objects you have single bulk-load without any additional reads/inserts. Even if you need relational schema, you might use this structure as a staging table to quickly load products. Keep in mind that this alternative approach exists even in relational database and you might find use cases.
Anonymous
September 08, 2015
WHERE Production.ProductReview.ProductID = Production.Product.ProductID Is that the deprecated syntax "More than two-part column name"? msdn.microsoft.com/.../ms143729.aspx
Anonymous
September 29, 2015
In SQL Server 2016 transforming Table to JSON document and JSON document to Table is excellent but update record of JSON format column data is quite complicated but manageable.
Anonymous
February 22, 2016
thanks for the article.when can we expect json support for sql azure. can we expect in next version.
Anonymous
June 23, 2016
We don't have SQL Server 2016 yet. But what I have done is used a text type. Then I use entity framework to convert it to a real object for me. And have it converted back automatically to JSON when the data is persisted. Works like a charm. If you don't need a readable format in your columns you could even use binary or protobuf.We really hated the fact that for every little field we had to introduce heavy migration scripts. Whilst most of the data was just that data. Not used for any logic, filtering or things like that. Just had to be sent to the client updated and back in the database. So may of our tables have an key and a text field called Details.
Anonymous
December 01, 2017
The update logic is not that seamless as in select, which is rightly suggested by author... I would say Update is a real turn down.
- Anonymous
  December 01, 2017
  This is older post written for some SQL server 2016 CTP version.In the meantime is added JSON_MODIFY function that can insert, update, or delete a value on some JSON path and append new value/object to array. JSON_MODIFY is available in official SQL Server 2016 RTM version.

次の方法で共有