Perform UNION ALL and Still Get the Benefit of Batch Processing
A UNION ALL operator can sometimes, but not always, prevent the batch mode of execution from working in your query when there is a columnstore index on one or more of the tables involved in the UNION ALL. You can work around this by writing separate queries (or subqueries or common table expressions (CTEs)) over each table that contributes to the UNION ALL, and combining the results.
For example, dbo.Purchase is a 101 million row fact table with a columnstore index. We create a one-row table with the same schema, dbo.DeltaPurchase, as follows:
select top 1 *
into DeltaPurchase
from dbo.Purchase;
Now, we create a view that combines dbo.Purchase and dbo.DeltaPurchase, as follows:
create view vPurchase
as
select * from dbo.Purchase
union all
select * from dbo.DeltaPurchase;
We also use smaller dimension tables dbo.Date and dbo.Media. This query combines dbo.Purchase and dbo.DeltaPurchase using UNION ALL (via the view definition), joins with the Date dimension, and runs mostly in batch mode in a fraction of a second on a 4-core machine:
select p.date, d.DayNumOfMonth, count(*)
from dbo.vPurchase as p,
dbo.Date d
where p.Date = d.DateId
group by p.date, d.DayNumOfMonth;
This query doesn't run in batch mode, and it takes half a minute or so to run:
select p.date, d.DayNumOfMonth, m.Genre, count(*)
from dbo.vPurchase p,
dbo.Date d,
dbo.Media m
where p.Date = d.DateId
and m.MediaId = p.MediaId
group by p.date, d.DayNumOfMonth, m.Genre
You can work around this by writing two separate subqueries (as CTEs) and combining the results, like so:
with CTE1 (date, DayNumOfmonth, Genre, c)
as
(
select p.date, d.DayNumOfMonth, m.Genre, count(*) c
from dbo.Purchase p,
dbo.Date d,
dbo.Media m
where p.Date = d.DateId
and m.MediaId = p.MediaId
group by p.date, d.DayNumOfMonth, m.Genre
),
CTE2 (date, DayNumOfmonth, Genre, c)
as
(
select p.date, d.DayNumOfMonth, m.Genre, count(*) c
from dbo.DeltaPurchase p,
dbo.Date d,
dbo.Media m
where p.Date = d.DateId
and m.MediaId = p.MediaId
group by p.date, d.DayNumOfMonth, m.Genre
),
CTE3 (date, DayNumOfMonth, Genre, c)
as
(
select * from CTE1
UNION ALL
select * from CTE2
)
select t.date, t.DayNumOfmonth, t.Genre, sum(c) as c
from CTE3 as t
group by t.date, t.DayNumOfmonth, t.Genre;
This query runs in under a second, and gets the same result as the previous query. It does most of its work in batch mode while computing CTE1. Then the remaining work of the query operates on a relatively small number of rows. The key idea is to duplicate the aggregate query logic over each branch of the UNION ALL, and then have a final, top-level query to combine the aggregates. This is known as local-global aggregation. It can be complex, but if you must get the absolute best performance when UNION ALL interferes with batch processing, it is a way to do it. Microsoft is working to make batch processing work automatically in queries like this, sometime after the SQL Server 2012 release.