Using non-relational models in SQL Server
In this post I will explain how you can use both relational and non-relational data models in your database schema.
Problem
Relational database schema might contain a lot of tables that require many JOINs or separate queries to retrieve all necessary data in the query. Also, in order to insert new records, we need to update many table, and in many case we need to follow parent/child order defined by referential integrity, read primary key of the parent row before we insert child row, etc.
NoSQL databases mitigate some of these problems by adding all related information in the single entity. This approach reduces complexity of model so we don’t need to join different entities because all information are contained in a single entry. Also, inserts are simple – new record with all related information is simply inserted in one entity.
Using the same approach in relational database might also simplify schema and improve performance of some workloads.
Solution
SQL Server enables you to combine relational and non-relational (JSON, XML) data in order to define optimal data model for your workloads. If you have a structure where primary object (row) has single related entity or collections of related entities that you don’t want to represent them as relational schema, you can store related information as JSON documents. The same approach use NoSQL databases.
Data model
In this scenario we will use SalesOrderHeader table from AdventureWorks database as an example. SalesOrderHeader table has a number of related tables that contain additional information about the sales order such as order items, customer data, salesman info, shipping and billing information, etc. Design of these tables in AdventureWorks database is shown in the following figure:
Although this design follows the standard principles for modeling standard relational schema, it might not be optimal for this use case.
This schema is optimized for workloads that have frequent updates in child tables. However, once we create sales order with all related items, it is unlikely that someone will constantly update quantities and unit prices of the items, sales reasons, shipping methods, customer/salesman info, etc. Information related to sales order are mostly read-only so we are not getting some benefits from normalization. The only case when someone will update this information are error correction cases.
To demonstrate how this structure can be represented in non-relational schema, in Adventure Works 2016 CTP3 database is added non-relational version of SalesOrderHeader table called SalesOrder_json that represents a copy of standard SalesOrderHeader table with following changes:
- Added column OrderItems that contains an array of SalesOrderDetail rows formatted as JSON array
- Added column sales reasons that will contain JSON array of strings
- Added column Info that contains various information about sales order (customer, salesman, shipping method, shipping address, etc.)
Now we can store information in JSON columns:
ID | .... | Info | SalesReasons | OrderItems |
43697 | {"ShippingInfo":{"Address":"601 Asilomar Dr.","City":"Metchosin","PostalCode":"V9","Province":"British Columbia","TerritoryID":6,"Method":"XRQ - TRUCK GROUND","ShipBase":3.9500,"ShipRate":0.9900},"BillingInfo":{"Address":"601 Asilomar Dr.","City":"Metchosin","PostalCode":"V9"}} | ["Manufacturer","Quality"] | [{"Item":{"Qty":1,"Price":3578.2700,"Discount":0.0000,"Total":3578.270000},"Product":{"Number":"BK-R93R-62","Name":"Road-150 Red, 62"}}] |
If you need to get information about the sales order and related items, you can do it with a single query. If your client expects JSON as response (e.g. Ajax client in web browser) this might be perfect solution for you because all information will be prepared in the format you need. Also, inserts and data load might be faster if you prepare data in JSON format on the client side because you will not need to update multiple tables in the same transaction. You just need to update one row in the sales order table.
Note that you are not loosing query capabilities. SQL Server provides functions JSON_VALUE, JSON_QUERY, and OPENJSON that can access any value in JSON columns.
If you want to access values in JSON using standard column syntax you can add a non-persisted computed column that returns value on some paths, e.g.:
ALTER TABLE Sales.SalesOrder_json
ADD vCustomerName AS JSON_VALUE(Info, '$.Customer.Name')
Now you can reference JSON value as any other real column via vCustomerName column.
If you need relational view on the array of JSON objects stored in OrderItems column you can create view that looks exactly as relational SalesOrderDetails table:
CREATE VIEW Sales.vwSalesOrderItems_json
AS
SELECT SalesOrderID, SalesOrderNumber, OrderDate,
CarrierTrackingNumber, OrderQty, UnitPrice,
UnitPriceDiscount, LineTotal, ProductNumber, Name
FROM Sales.SalesOrder_json
CROSS APPLY
OPENJSON (OrderItems)
WITH ( CarrierTrackingNumber NVARCHAR(20),
OrderQty int '$.Item.Qty',
UnitPrice float '$.Item.Price',
UnitPriceDiscount float '$.Item.Discount',
LineTotal float '$.Item.Total',
ProductNumber NVARCHAR(20) '$.Product.Number',
Name NVARCHAR(50) '$.Product.Name'
)
Queries executed on this view will return the same results as queries on the real SalesOrderDetails table.
Conclusion
SQL Server enables you to make trade-offs between relational and non-relational modeling in order to find the optimal data model for your workload. In SQL server you can create your database schema based on characteristics of your workload, using some of the following guidelines:
- If your workload is read-heavy and you need to reduce complexity of schema and number of JOINs, you can format information as JSON.
- If you need to load hierarchical data into database and you don’t want to update multiple tables, read auto generated keys/identity/ sequence for each parent table to load child rows, you can store all information in the single row and load each record using a single insert.
- If you have heavy transnational workload where related information are frequently changed you can use relational model that gives you best performance in this case.