Jaa


PIVOT Query Plans

In my last post, I gave an overview of the PIVOT operator.  In this post, I'm going to take a look at the query plans generated by the PIVOT operator.  As we'll see, SQL Server generates a surprisingly simple query plan that is essentially just a fancy aggregation query plan.

Let's use the same schema and queries from my previous post:

CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)

SELECT [2005], [2006], [2007]
FROM (SELECT Yr, Sales FROM Sales) AS s
PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p

This query generates the following query plan:

   |--Compute Scalar(DEFINE:(
                 [Expr1006]=CASE WHEN [Expr1024]=(0) THEN NULL ELSE [Expr1025] END,
                 [Expr1007]=CASE WHEN [Expr1026]=(0) THEN NULL ELSE [Expr1027] END,
                 [Expr1008]=CASE WHEN [Expr1028]=(0) THEN NULL ELSE [Expr1029] END))
       |--Stream Aggregate(DEFINE:(
                      [Expr1024]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1025]=SUM(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1026]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1027]=SUM(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1028]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1029]=SUM(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END)))
            |--Table Scan(OBJECT:([Sales]))

This is just a basic scalar aggregation query plan!  It calculates one SUM aggregate for each year.  Like any SUM aggregate, each aggregate actually computes both the count and the sum.  If the count is zero, the query plan returns NULL else it returns the sum.  (The compute scalar handles this logic.)

The only twist is that each SUM aggregate is actually computed over a CASE statement that filter for those rows that match the year for which it is summing sales.  The CASE statement returns the value of the Sales column for those rows that match the year and NULLs for all other rows.  To clarify what is happening, we can view the results of the CASE statements without the aggregation:

SELECT EmpId, Yr,
    CASE WHEN Yr = 2005 THEN Sales END AS [2005],
    CASE WHEN Yr = 2006 THEN Sales END AS [2006],
    CASE WHEN Yr = 2007 THEN Sales END AS [2007]
FROM Sales

 EmpId       Yr          2005                  2006                  2007
----------- ----------- --------------------- --------------------- ---------------------
1           2005        12000.00              NULL                  NULL
1           2006        NULL                  18000.00              NULL
1           2007        NULL                  NULL                  25000.00
2           2005        15000.00              NULL                  NULL
2           2006        NULL                  6000.00               NULL
3           2006        NULL                  20000.00              NULL
3           2007        NULL                  NULL                  24000.00
2           2007        NULL                  NULL                  NULL

When computing the sums of each of the year columns, the query plan relies on the fact that aggregate functions discard NULLs; that is, the NULL values are not included in the results.  Although this point may seem intuitive for a SUM aggregate, the significance is clearer for a COUNT aggregate:

CREATE TABLE T (A INT)
INSERT T VALUES(NULL)

-- Returns 1: the number rows in table T
SELECT COUNT(*) FROM T

-- Returns 0: the number of non-NULL values of column A
SELECT COUNT(A) FROM T

Note that we could just as easily have written the original query as:

SELECT
    SUM(CASE WHEN Yr = 2005 THEN Sales END) AS [2005],
    SUM(CASE WHEN Yr = 2006 THEN Sales END) AS [2006],
    SUM(CASE WHEN Yr = 2007 THEN Sales END) AS [2007]
FROM Sales

This query gets a nearly identical query plan.  The only visible difference is the use of an extra compute scalar to evaluate the CASE statements.

   |--Compute Scalar(DEFINE:(
                 [Expr1004]=CASE WHEN [Expr1013]=(0) THEN NULL ELSE [Expr1014] END,
                 [Expr1005]=CASE WHEN [Expr1015]=(0) THEN NULL ELSE [Expr1016] END,
                 [Expr1006]=CASE WHEN [Expr1017]=(0) THEN NULL ELSE [Expr1018] END))
       |--Stream Aggregate(DEFINE:(
                      [Expr1013]=COUNT_BIG([Expr1007]), [Expr1014]=SUM([Expr1007]),
                      [Expr1015]=COUNT_BIG([Expr1008]), [Expr1016]=SUM([Expr1008]),
                      [Expr1017]=COUNT_BIG([Expr1009]), [Expr1018]=SUM([Expr1009])))
            |--Compute Scalar(DEFINE:(
                           [Expr1007]=CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END,
                           [Expr1008]=CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END,
                           [Expr1009]=CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END))
                 |--Table Scan(OBJECT:([Sales]))

The other big difference between the PIVOT syntax and query plan and the alternative syntax and query plan is that the PIVOT query suppresses the following warning about NULLs:

Warning: Null value is eliminated by an aggregate or other SET operation.

We could also suppress this warning by executing the following statement:

SET ANSI_WARNINGS OFF

At this point, the query plan for a multi-row PIVOT operation should not come as a surprise:

SELECT EmpId, [2005], [2006], [2007]
FROM (SELECT EmpId, Yr, Sales FROM Sales) AS s
PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p

The query plan for this PIVOT operation uses CASE statements to compute the same intermediate result that we saw above.  Then, like any other GROUP BY query, it uses either stream or hash aggregate to group by EmpId and to compute the final result.  In this case, the optimizer chooses a stream aggregate.  Since we do not have an index to provide order, it must also introduce a sort.

   |--Compute Scalar(DEFINE:(
                 [Expr1007]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END,
                 [Expr1008]=CASE WHEN [Expr1027]=(0) THEN NULL ELSE [Expr1028] END,
                 [Expr1009]=CASE WHEN [Expr1029]=(0) THEN NULL ELSE [Expr1030] END))
       |--Stream Aggregate(GROUP BY:([Sales].[EmpId]) DEFINE:(
                      [Expr1025]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1026]=SUM(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1027]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1028]=SUM(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1029]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1030]=SUM(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END)))
            |--Compute Scalar(DEFINE:([Sales].[EmpId]=[Sales].[EmpId]))
                 |--Sort(ORDER BY:([Sales].[EmpId] ASC))
                      |--Table Scan(OBJECT:([Sales]))

To see how this query really is no different than any other GROUP BY query or to see some alternative query plans, try creating a clustered index on EmpId to eliminate the sort or using an OPTION(HASH GROUP) hint to force a hash aggregate.