Share via


SQL Server: Trickle Loading with Columnstore Indexes

At first glance it would appear you can't do trickle load and real-time query with columnstore indexes because you can't directly insert data into a table with a columnstore index. But you can handle trickle load and maintain very fast query performance if you are willing to do some scripting and change your queries. You can have a logical fact table composed of two fact tables, one with a columnstore index, and another, much smaller one (a "delta" table that acts as a change file) structured as a B-tree or heap. Then your queries can do local-global aggregation, aggregating the data from the columnstore and from the delta table in separate queries, and combining the results with a second level of grouping and aggregation. Here's a short example illustrating the approach.

create table  Fact(DateKey int, Measure numeric(18,2)); 
insert Fact values(20110908, 1); 
insert Fact values(20110909, 2); 
 
create nonclustered columnstore index cs on Fact(DateKey,Measure); 
 
-- Find total sales by day 
select DateKey, sum(Measure) from  Fact group  by DateKey; 
 
-- Create table to hold trickle updates 
create table  FactDelta(DateKey int, Measure numeric(18,2)); 
 
-- Add a new row (a "trickle update" to Fact) 
insert FactDelta values(20110909, 2); 
 
-- Find total sales by day from the fact data consisting of 
-- Fact plus FactDelta. 
with FirstLevelAgg (DateKey, Measure) 
    as (select DateKey, sum(Measure) from  Fact group  by DateKey 
    UNION ALL
    select DateKey, sum(Measure) from  FactDelta group  by DateKey) 
select DateKey, sum(Measure) from  FirstLevelAgg group  by DateKey

As long as FactDelta is much smaller than Fact, the query above will run extremely fast because most  of the work of the query is done by the highlighted subquery, which will run completely in batch mode. Since FactDelta is small, it won't contribute much to the total cost of the query.

You would periodically (say every night) move data from the delta table to the columnstore table. Batch-mode query execution doesn't work consistently across UNION ALL (yet), so creating a view combining the columnstore table and delta table and querying the view won't work as well as the approach described above. For a discussion of how to use the UNION ALL operator and still consistently get the benefits of batch mode, see this topic. It elaborates on the local-global aggregation technique used here.

Return to main SQL Server columnstore index tuning page.