Extreme 25x compression of JSON data using CLUSTERED COLUMNSTORE INDEXES
CLUSTERED COLUMNSTORE INDEXES (CCI) provide extreme data compression. In Azure SQL Database and SQL Server vNext you can create CCI on tables with NVARCHAR(MAX) columns. Since JSON is stored as NVARCHAR type, now you can store huge volumes of JSON data in tables with CCI. In this post, I will show you how you can get 25x compression on a table that contains JSON/NVARCHAR(MAX) column using CCI.
In this experiment, I will use publicly available ContosoDW database where we have FactOnlineSales table with 12 million records. This fact table is related to a number of dimension tables such as DimStore, DimCustomer, DimProduct, etc., as it is shown on the following figure:
Imagine that all these related tables are stored in NoSQL-style as a single JSON document. In that case we would have a single table that will have some Data NVARCHAR(MAX) column where we would store data from all related tables as JSON text - something like:
DROP TABLE IF EXISTS SalesRecords
GO
CREATE TABLE SalesRecords(
OrderNumber nvarchar(20) not null,
ProductKey int not null,
StoreKey int not null,
Date datetime not null,
CustomerKey int not null,
Data nvarchar(max) not null,
INDEX cci CLUSTERED COLUMNSTORE
)
In this example, I will keep int key columns as separate columns and store all other columns from FactOnlineSales table and all columns from the related tables in a single Data column. The query that will de-normalize all related dimensions into a single column is at the end of this post, and it looks like:
INSERT INTO SalesRecords(OrderNumber, StoreKey, ProductKey, Date, CustomerKey, Data)
SELECT FactOnlineSales.SalesOrderNumber, FactOnlineSales.StoreKey, FactOnlineSales.ProductKey, FactOnlineSales.DateKey, FactOnlineSales.CustomerKey,
(SELECT ... FROM DimensionTables FOR JSON PATH) as Data
I'm joining all related dimension tables, format them as JSON text using FOR JSON clause and loading everything into SalesRecords table. This query will populate a table with CCI index.
I will also create a copy of this table but without CCI (plain heap table) using the following query:
select *
into SalesRecordsRS
from SalesRecords
Now, I will compare sizes of the table with CCI and the table without CCI using the following query:
exec sp_spaceused 'SalesRecordsRS'
exec sp_spaceused 'SalesRecords'
The results of these queries are shown below:
A table without CCI has 101.020.920 KB, while the table with CCI has only 4.047.128 KB in data column. With CCI we can compress 100GB table to 4GB with 24.96 compression ratio!
Compression is not important only for storage savings. The following query on a table with CCI this query is executed in 46 seconds, while on a heap table execution takes 13 min 45 seconds.
select min(datalength(data)), avg(datalength(data)), max(datalength(data))
from SalesRecords
Smaller disk io and batch execution provided by CCI enables you to run 18x faster queries.
Conclusion
CLUSTERED COLUMNSTORE INDEXES provide extreme data compression in SQL Server and Azure SQL Database. With NVARCHAR(MAX) support in CCI indexes you can use them on your JSON data stored is database and get high 25x compression. Therefore, CCI is a perfect solution if you need to store a large volume of JSON data in your SQL Database.
ContosoDW database is publicly available for download, so you can use this database and the script below to re-create this table and try this in your environment.
SQL Script used to populate the table is:
INSERT INTO SalesRecords(OrderNumber, StoreKey, ProductKey, Date, CustomerKey, Data)
SELECT FactOnlineSales.SalesOrderNumber, FactOnlineSales.StoreKey, FactOnlineSales.ProductKey, FactOnlineSales.DateKey, FactOnlineSales.CustomerKey,
(SELECT FactOnlineSales.PromotionKey,
FactOnlineSales.SalesOrderLineNumber, FactOnlineSales.SalesQuantity, FactOnlineSales.SalesAmount, FactOnlineSales.CurrencyKey,
FactOnlineSales.ReturnQuantity, FactOnlineSales.ReturnAmount, FactOnlineSales.DiscountQuantity, FactOnlineSales.DiscountAmount, FactOnlineSales.TotalCost,
FactOnlineSales.UnitCost, FactOnlineSales.UnitPrice,
DimProduct.ProductName AS [Product.Name], DimProduct.ProductDescription AS [Product.Description], DimProduct.Manufacturer AS [Product.Manufacturer],
DimProduct.BrandName AS [Product.Brand], DimProduct.ClassName AS [Product.Class], DimProduct.StyleName AS [Product.Style],
DimProduct.ColorName AS [Product.Color], DimProduct.Size AS [Product.Size], DimProduct.SizeRange AS [Product.SizeRange],
DimProduct.Weight AS [Product.Weight], DimProduct.UnitCost AS [Product.UnitCost], DimProduct.UnitPrice AS [Product.UnitPrice],
DimProduct.ImageURL AS [Product.ImageURL], DimProduct.ProductURL AS [Product.URL],
DimProductSubcategory.ProductSubcategoryLabel AS [Product.SubcategoryLabel], DimProductSubcategory.ProductSubcategoryName AS [Product.SubcategoryName],
DimProductSubcategory.ProductSubcategoryDescription AS [Product.SubcategoryDescription], DimProductCategory.ProductCategoryLabel AS [Product.CategoryLabel],
DimProductCategory.ProductCategoryName AS [Product.CategoryName], DimProductCategory.ProductCategoryDescription AS [Product.CategoryDescription],
DimCustomer.CustomerLabel AS [Customer.Label], DimCustomer.Title AS [Customer.Title], DimCustomer.FirstName AS [Customer.FirstName],
DimCustomer.MiddleName AS [Customer.MiddleName], DimCustomer.LastName AS [Customer.LastName], DimCustomer.NameStyle AS [Customer.NameStyle],
DimCustomer.BirthDate AS [Customer.BirthDate], DimCustomer.MaritalStatus AS [Customer.MaritalStatus], DimCustomer.Suffix AS [Customer.Suffix],
DimCustomer.Gender AS [Customer.Gender], DimCustomer.EmailAddress AS [Customer.EmailAddress], DimCustomer.YearlyIncome AS [Customer.YearlyIncome],
DimCustomer.TotalChildren AS [Customer.TotalChildren], DimCustomer.NumberChildrenAtHome AS [Customer.NumberChildrenAtHome],
DimCustomer.Education AS [Customer.Education], DimCustomer.Occupation AS [Customer.Occupation],
DimCustomer.HouseOwnerFlag AS [Customer.HouseOwnerFlag], DimCustomer.AddressLine1 AS [Customer.AddressLine1],
DimCustomer.NumberCarsOwned AS [Customer.NumberCarsOwned], DimCustomer.AddressLine2 AS [Customer.AddressLine2],
DimCustomer.Phone AS [Customer.Phone], DimCustomer.CompanyName AS [Customer.CompanyName], DimGeography_1.CityName AS [Customer.CityName],
DimGeography_1.StateProvinceName AS [Customer.StateProvinceName], DimGeography_1.RegionCountryName AS [Customer.RegionCountryName],
DimGeography_1.ContinentName AS [Customer.ContinentName], DimGeography_1.GeographyType AS [Customer.GeographyType],
JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimGeography_1.Geometry.STX,',', DimGeography_1.Geometry.STY,']}}')) AS [Customer.Geometry],
DimCurrency.CurrencyName AS [Currency.Name], DimCurrency.CurrencyDescription AS [Currency.Description],
DimCurrency.CurrencyLabel AS [Currency.Label], DimPromotion.PromotionLabel AS [Promotion.Label], DimPromotion.PromotionName AS [Promotion.Name],
DimPromotion.PromotionDescription AS [Promotion.Description], DimPromotion.DiscountPercent AS [Promotion.DiscountPercent],
DimPromotion.PromotionType AS [Promotion.Type], DimPromotion.PromotionCategory AS [Promotion.Category], DimPromotion.StartDate AS [Promotion.StartDate],
DimPromotion.EndDate AS [Promotion.EndDate], DimPromotion.MinQuantity AS [Promotion.MinQuantity], DimPromotion.MaxQuantity AS [Promotion.MaxQuantity],
DimStore.StoreName AS [Store.Name], DimStore.StoreDescription AS [Store.Description], DimStore.StoreManager AS [Store.Manager],
DimStore.StoreType AS [Store.Type], DimStore.Status AS [Store.Status], DimStore.OpenDate AS [Store.OpenDate], DimStore.CloseDate AS [Store.CloseDate],
DimStore.ZipCode AS [Store.ZipCode], DimStore.ZipCodeExtension AS [Store.ZipCodeExtension], DimStore.StorePhone AS [Store.Phone],
DimStore.StoreFax AS [Store.Fax], DimStore.AddressLine1 AS [Store.AddressLine1], DimStore.AddressLine2 AS [Store.AddressLine2],
JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimStore.Geometry.STX,',', DimStore.Geometry.STY,']}}')) AS [Store.Geometry],
JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimStore.GeoLocation.Lat,',', DimStore.GeoLocation.Long,']}}')) AS [Store.GeoLocation],
DimGeography.CityName AS [Store.CityName],
DimGeography.StateProvinceName AS [Store.StateProvinceName], DimGeography.RegionCountryName AS [Store.RegionCountryName],
DimGeography.ContinentName AS [Store.ContinentName], DimGeography.GeographyType AS [Store.GeographyType],
JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimGeography.Geometry.STX,',', DimGeography.Geometry.STY,']}}')) AS [Store.Geo.Location],
DimGeography.GeographyKey AS [Store.GeographyKey], DimEntity.EntityLabel AS [Store.Entity.Label],
DimEntity.EntityName AS [Store.Entity.Name], DimEntity.EntityDescription AS [Store.Entity.Description], DimEntity.EntityType AS [Store.Entity.Type],
DimEntity.Status AS [Store.Entity.Status], DimDate.FullDateLabel AS [Date.FullDateLabel], DimDate.DateDescription AS [Date.DateDescription],
DimDate.CalendarYear AS [Date.CalendarYear], DimDate.CalendarMonthLabel AS [Date.CalendarMonthLabel], DimDate.FiscalYear AS [Date.FiscalYear],
DimDate.FiscalMonth AS [Date.FiscalMonth], DimDate.FiscalYearLabel AS [Date.FiscalYearLabel], DimDate.CalendarYearLabel AS [Date.CalendarYearLabel],
DimDate.CalendarHalfYear AS [Date.CalendarHalfYear], DimDate.CalendarHalfYearLabel AS [Date.CalendarHalfYearLabel], DimDate.Datekey AS [Date.Datekey],
DimDate.CalendarQuarter AS [Date.CalendarQuarter], DimDate.CalendarQuarterLabel AS [Date.CalendarQuarterLabel],
DimDate.CalendarMonth AS [Date.CalendarMonth], DimDate.CalendarWeek AS [Date.CalendarWeek], DimDate.CalendarWeekLabel AS [Date.CalendarWeekLabel],
DimDate.CalendarDayOfWeekLabel AS [Date.CalendarDayOfWeekLabel], DimDate.CalendarDayOfWeek AS [Date.CalendarDayOfWeek],
DimDate.FiscalHalfYear AS [Date.FiscalHalfYear], DimDate.FiscalHalfYearLabel AS [Date.FiscalHalfYearLabel], DimDate.FiscalQuarter AS [Date.FiscalQuarter],
DimDate.FiscalQuarterLabel AS [Date.FiscalQuarterLabel], DimDate.FiscalMonthLabel AS [Date.FiscalMonthLabel]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) as Data
FROM FactOnlineSales INNER JOIN
DimDate ON FactOnlineSales.DateKey = DimDate.Datekey INNER JOIN
DimStore ON FactOnlineSales.StoreKey = DimStore.StoreKey INNER JOIN
DimProduct ON FactOnlineSales.ProductKey = DimProduct.ProductKey INNER JOIN
DimPromotion ON FactOnlineSales.PromotionKey = DimPromotion.PromotionKey INNER JOIN
DimCurrency ON FactOnlineSales.CurrencyKey = DimCurrency.CurrencyKey INNER JOIN
DimCustomer ON FactOnlineSales.CustomerKey = DimCustomer.CustomerKey INNER JOIN
DimGeography ON DimStore.GeographyKey = DimGeography.GeographyKey INNER JOIN
DimProductSubcategory ON DimProduct.ProductSubcategoryKey = DimProductSubcategory.ProductSubcategoryKey INNER JOIN
DimProductCategory ON DimProductSubcategory.ProductCategoryKey = DimProductCategory.ProductCategoryKey INNER JOIN
DimGeography DimGeography_1 ON DimCustomer.GeographyKey = DimGeography_1.GeographyKey INNER JOIN
DimEntity ON DimStore.EntityKey = DimEntity.EntityKey
Comments
- Anonymous
February 11, 2017
Hi there,I'm using SQL Server 2016 Dev. SP1You can't create nvarchar(max) in cci (I changed compatibility level to 2016) - you will get an error.https://msdn.microsoft.com/en-us/library/gg492153.aspxThanks- Anonymous
February 14, 2017
From above: "In Azure SQL Database and SQL Server vNext you can create CCI on tables with NVARCHAR(MAX) columns". 2016 SP1 is not mentioned. - Anonymous
February 14, 2017
Hi, LOBS in CCI are supported in SQL Server vNext and Azure SQL Database. This feature is not available in SQL Server 2016.- Anonymous
February 18, 2017
Thank you
- Anonymous
- Anonymous