Partilhar via


Columnstore index: Why do we refer to it as In-Memory Analytics?

Columnstore index is part of In-Memory technologies released in SQL Server 2014 and SQL Server 2016. Please refer to https://blogs.msdn.microsoft.com/sqlserverstorageengine/tag/columnstore-index/ for a series of blogs on columnstore index.

As the name In-Memory Analytics implies, you may think that columnstore index is entirely memory resident but it is not really the case for columnstore indexes created on disk-based tables. In fact, the columnstore, as the picture below shows, is divided into one or more rowgroups containing up to 1 million rows and each column within a rowgroup, referred to as segment, is stored independently as Blobs. These blobs are stored just like any other blob datatype and fetched into memory when accessed. This is goodness because for multi-terabyte data warehouses, it will require large amount of physical memory to keep the full columnstore index in memory. For example, if the size of the data warehouse is 30TB, you may require 3TB of physical memory assuming you get 10x compression.

columnstore-structure

Additionally, many data warehouses keep data for 10 years or longer for compliance reasons. Most older data is cold and is rarely accessed. Many organizations manage this large volume of data using table partitioning. It makes little sense to require those partitions, if stored as columnstore index, to be memory resident. Hopefully, this has convinced you that it indeed is a not a good idea to require columnstore index to be memory resident especially for data warehouses (DW).

 So why do we still refer to columnstore index as In-Memory Analytics? Though I agree that it misleading, there are two supporting points as described below. I do realize that for the purist this is not a convincing argument, but at least you will understand why this has been branded as In-Memory Analytics.

  • Historical. The columnstore was first introduced in SQL Server 20088R2 as part of Power Pivot in Excel. That implementation requires data to be memory resident.
  • Columnstore provides very high data compression and only require loading column(s) referenced in the query. Many analytic workloads using columnstore index, under steady state, require minimal or no physical IO.

To make things a bit more complex, the columnstore index created on memory-optimized tables is indeed required to be fully memory resident. This is not a big issue considering that memory-optimized tables are fully memory resident and, if you assume 10x compression, the columnstore index would require 10% additional memory.

Thanks

Sunil Agarwal

SQL Server Tiger Team

Twitter | LinkedIn

Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam