Partager via


COLUMN STORE INDEXES

 

Some days ago I visit a client on Puerto Rico that was having some performance problems on a transactional application and after resolving the issue the client express interested on learning about Column Store Index, this kind of indexes is a new functionality on SQL Server 2012, as we will see his kind of index work great on some scenarios but not all. The SQL Server versions that support this functionality is SQL Server 2012 Enterprise, Evaluation and Developer Edition.

Traditional Indexes are stored by rows instead of by columns. This kind of storage is extremely efficient when you require to access the data on a row or a small amount of rows. However if you request all the rows or a really big range, this approach becomes not as effective.

  

The Store Column Indexes allow you to present a big range of rows by storing the data organizing it by columns. When you create the Column Store Index you usually include all the columns on a table, this will assure that all columns will

  

In a Column Store Index, instead of storing all column of a specific row, each column is store a part from each other with all the rows from that column. The benefit of this type of index is that only the columns and rows needed to reply a request will be read. Usually a Dataware House scenarios only a 15% of the columns of an index to obtain the result of a request.

There are two principal restriction to consider when working with Column Store Indexes. First a Column Store Index is read-only, once created you can modify the base table, this means operations like INSERT, UPDATE and DELETE are not allowed, for this reason is a good practice to use table partitioning to reduce that amount of data that need to be inserted on a Column Index Store and allow an easier reconstruction of an index when inserting new values to the table, because of this restriction the Column Index Store are more suited for scenarios that data don’t change frequently, like in Dataware House. The second restriction is that there can be only one Column Index Store per table, this limitation is not a real problem because you usually include ALL the columns of a table on the Column Index Store.

Another limitation is related to the creation time of the Column Index Store in comparison with a non-clustered Index, the average time could be from 2 to 3 times longer than the non-clustered index. However, despite the restrictions mentioned earlier, the Column Index Store can provide a great value on performance benefits, and also the compression you get from similar data on the same page.

The following data types can NOT be used on the Column Index Store: binary, varbinary, ntext, text, image, nvarchar(max), varchar(max), uniqueidentifier, rowversion, sql_variant, decimal (greater than 18 digits), datetimeoffset, xml, and data types based on CLR. Also the number of columns are restricted to 1024. Finally the index cannot be UNIQUE or CLUSTERED, contain Included Columns or have a defined order (Ascending or Descending)

The Column Index Store use their own compression technology, and that why they can be combined with the page and row compression data options. They can’t also be used on replication schemas, change tracking, change data capture or filestream. This technologies works on a Read/Write scenario and that why are not compatible with the read-only nature of the Column Index Store.

How to create a Column Index Store:

The creation of the Column Index Store can be done through T-SQL or SQL Server Management Studio

T-SQL

CREATE NONCLUSTERED COLUMNSTORE INDEX <ColumnStoreIndexName> ON<Table> (col1, col2, col3);

-- Create the columnstore index

CREATE NONCLUSTERED COLUMNSTORE INDEX [csindx_FactResellerSalesPtnd]

ON [FactResellerSalesPtnd]

(

    [ProductKey], [OrderDateKey], [DueDateKey], [ShipDateKey], [CustomerKey], [EmployeeKey],

    [PromotionKey], [CurrencyKey], [SalesTerritoryKey], [SalesOrderNumber], [SalesOrderLineNumber],

    [RevisionNumber], [OrderQuantity], [UnitPrice], [ExtendedAmount], [UnitPriceDiscountPct],

    [DiscountAmount], [ProductStandardCost], [TotalProductCost], [SalesAmount], [TaxAmt], [Freight],

    [CarrierTrackingNumber], [CustomerPONumber], [OrderDate], [DueDate], [ShipDate]

);

 

Management Studio

  1. Through Managment Studio, use the Object Explorer to connect to the SQL Server Instance
  2. On the Object Explorer, expand the SQL Server Instance, the Database and the table where you want to create the Index.
  3.   Right Click on the mouse and select Non-Clustered Columnstore Index under the New Index option on Indexes.
  4. Give a name and select the columns involved on the Column Index Store. Press OK two times

Column Index Store and Partitioned Tables

Let’s create a partitioned table named FactResellerSalesPtnd using the following code from MSDN

Step #1: Create the table FactResellerSalesPtnd (A Partittioned versión of: FactResellerSales)

USE AdventureWorksDW2012;

GO

 

CREATE PARTITION FUNCTION [ByOrderDateMonthPF](int) AS RANGE RIGHT

FOR VALUES (

    20050701, 20050801, 20050901, 20051001, 20051101, 20051201,

    20060101, 20060201, 20060301, 20060401, 20060501, 20060601,

    20060701, 20060801, 20060901, 20061001, 20061101, 20061201,

    20070101, 20070201, 20070301, 20070401, 20070501, 20070601,

    20070701, 20070801, 20070901, 20071001, 20071101, 20071201,

    20080101, 20080201, 20080301, 20080401, 20080501, 20080601,

    20080701, 20080801, 20080901, 20081001, 20081101, 20081201

)

GO

 

CREATE PARTITION SCHEME [ByOrderDateMonthRange]

AS PARTITION [ByOrderDateMonthPF]

ALL TO ([PRIMARY])

GO

 

-- Create a partitioned version of the FactResellerSales table

CREATE TABLE [dbo].[FactResellerSalesPtnd](

    [ProductKey] [int] NOT NULL,

    [OrderDateKey] [int] NOT NULL,

    [DueDateKey] [int] NOT NULL,

    [ShipDateKey] [int] NOT NULL,

    [CustomerKey] [int] NOT NULL,

    [EmployeeKey] [int] NOT NULL,

    [PromotionKey] [int] NOT NULL,

    [CurrencyKey] [int] NOT NULL,

    [SalesTerritoryKey] [int] NOT NULL,

    [SalesOrderNumber] [nvarchar](20) NOT NULL,

    [SalesOrderLineNumber] [tinyint] NOT NULL,

    [RevisionNumber] [tinyint] NULL,

    [OrderQuantity] [smallint] NULL,

    [UnitPrice] [money] NULL,

    [ExtendedAmount] [money] NULL,

    [UnitPriceDiscountPct] [float] NULL,

    [DiscountAmount] [float] NULL,

    [ProductStandardCost] [money] NULL,

    [TotalProductCost] [money] NULL,

    [SalesAmount] [money] NULL,

    [TaxAmt] [money] NULL,

    [Freight] [money] NULL,

    [CarrierTrackingNumber] [nvarchar](25) NULL,

    [CustomerPONumber] [nvarchar](25) NULL,

    OrderDate [datetime] NULL,

    DueDate [datetime] NULL,

    ShipDate [datetime] NULL

) ON ByOrderDateMonthRange(OrderDateKey);

GO

 

-- Using simple or bulk logged recovery mode, and then the TABLOCK

-- hint on the target table of the INSERT…SELECT is a best practice

-- because it causes minimal logging and is therefore much faster.

ALTER DATABASE AdventureWorksDW2012 SET RECOVERY SIMPLE;

GO

 

-- Copy the data from the FactResellerSales into the new table

INSERT INTO dbo.FactResellerSalesPtnd WITH(TABLOCK)

SELECT * FROM dbo.FactResellerSales;

GO

 

-- Create the columnstore index

CREATE NONCLUSTERED COLUMNSTORE INDEX [csindx_FactResellerSalesPtnd]

ON [FactResellerSalesPtnd]

(

    [ProductKey],

    [OrderDateKey],

    [DueDateKey],

    [ShipDateKey],

    [CustomerKey],

    [EmployeeKey],

    [PromotionKey],

    [CurrencyKey],

    [SalesTerritoryKey],

    [SalesOrderNumber],

    [SalesOrderLineNumber],

    [RevisionNumber],

    [OrderQuantity],

    [UnitPrice],

    [ExtendedAmount],

    [UnitPriceDiscountPct],

    [DiscountAmount],

    [ProductStandardCost],

    [TotalProductCost],

    [SalesAmount],

    [TaxAmt],

    [Freight],

    [CarrierTrackingNumber],

    [CustomerPONumber],

    [OrderDate],

    [DueDate],

    [ShipDate]

);

 

Step #2: Let’s execute an query and confirm if the Column Index Store was used

  1. a.       Press Ctrl+M, or select Include Actual Execution Plan from the Query menú on SQL Server Management Studio to actívate a graphical representation of the actual execution plan.
  2. b.      On the Query Editor execute the following query:

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory

FROM FactResellerSalesPtnd

GROUP BY SalesTerritoryKey;

Performance Benefits for Column Index Store

Cost

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory

FROM FactResellerSales

GROUP BY SalesTerritoryKey;

 

-- Índice almacenado por columnas

SELECT SalesTerritoryKey, SUM (ExtendedAmount) AS SalesByTerritory

FROM FactResellerSalesPtnd

GROUP BY SalesTerritoryKey;

The relative cost of the Second Query (using a column index store) is 16% in comparison with the relative cost of the First Query (using a regular clústeres index)

Disk I/O

 

When executing the queries using STATISTICS IO ON (show the information related with the Disk I/O activity for the queries) you see a big improvement on the Second Query

 

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

 

SET STATISTICS IO ON

 

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory

FROM FactResellerSales

GROUP BY SalesTerritoryKey;

 

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory

FROM FactResellerSalesPtnd

GROUP BY SalesTerritoryKey;

 

SET STATISTICS IO OFF

 

 

(10 row(s) affected)

Table 'FactResellerSales'. Scan count 1, logical reads 2982, physical reads 2, read-ahead reads 2972, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 

(10 row(s) affected)

Table 'FactResellerSalesPtnd'. Scan count 1, logical reads 599, physical reads 4, read-ahead reads 235, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Time

When executing the queries using STATISTICS IO ON (show the information in milliseconds related with the analyzing, compile and executing the queries) you see a time reduction on the Second Query

 

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

 

SET STATISTICS TIME ON

 

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory

FROM FactResellerSales

GROUP BY SalesTerritoryKey;

 

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory

FROM FactResellerSalesPtnd

GROUP BY SalesTerritoryKey;

 

SET STATISTICS TIME OFF

 

SQL Server Execution Times:

   CPU time = 46 ms, elapsed time = 109 ms.

 

SQL Server Execution Times:

   CPU time = 32 ms, elapsed time = 85 ms.

 

Conclusion

Experts agree that the performance improvement when using column index store on a table fluctuate between 10% and 100%. However as mentioned in this article applications that benefit the most are the ones with high read volumes like data warehousing, but not for high write scenarios like OLTP systems. The star and snow flake schemas usually take part on data warehousing and datamarts where the velocity on data extraction is more important that the efficiency on the data manipulation. The column index store can improve this type of scenarios that are ideal for this technology.