Jaa


Aggregation WITH CUBE

In my last post, I wrote about how aggregation WITH ROLLUP works.  In this post, I will discuss how aggregation WITH CUBE works.  Like the WITH ROLLUP clause, the WITH CUBE clause permits us to compute multiple "levels" of aggregation in a single statement.  To understand the difference between these two clauses, let's look at an example.  We'll use the same fictitious sales data from last week's example.

CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)

Consider the following query from last week:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP

It will be easier to see what is happening if we pivot the sales data:

 

Yr

2005

2006

2007

ALL

EmpId

1

12000.00

18000.00

25000.00

55000.00

2

15000.00

6000.00

 

21000.00

3

 

20000.00

24000.00

44000.00

ALL

 

 

 

120000.00

The table clearly shows that the WITH ROLLUP clause computes the total for each employee for all years and the grand total for all employees and all years.  The query does not compute the totals for each year for all employees.  Moreover, the order of the columns in the GROUP BY clause determines in which order the data is totaled.

Now let's repeat the same query but replace the WITH ROLLUP clause with a WITH CUBE clause:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH CUBE

This query computes all possible sub-totals and totals:

 

Yr

2005

2006

2007

ALL

EmpId

1

12000.00

18000.00

25000.00

55000.00

2

15000.00

6000.00

 

21000.00

3

 

20000.00

24000.00

44000.00

ALL

27000.00

44000.00

49000.00

120000.00

Because the WITH CUBE clause causes the query to compute all possible totals, the order of the columns in the GROUP BY clause does not matter.  Of course, by default, SQL Server does not pivot the results of either of the above queries.  Here is the actual output from the WITH CUBE query:

 EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00
NULL        NULL        120000.00
NULL        2005        27000.00
NULL        2006        44000.00
NULL        2007        49000.00

Next, let's look at the query plan for the WITH CUBE query:

  |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
       |--Concatenation
            |--Stream Aggregate(GROUP BY:([Sales].[EmpId], [Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
            |    |--Sort(ORDER BY:([Sales].[EmpId] ASC, [Sales].[Yr] ASC))
            |         |--Table Spool
            |              |--Stream Aggregate(GROUP BY:([Sales].[Yr], [Sales].[EmpId]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
            |                   |--Sort(ORDER BY:([Sales].[Yr] ASC, [Sales].[EmpId] ASC))
            |                        |--Table Scan(OBJECT:([Sales]))
            |--Compute Scalar(DEFINE:([Expr1012]=NULL))
                 |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
                      |--Table Spool

This plan consists of two parts.  SQL Server has effectively rewritten our query as follows:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP
UNION ALL
SELECT NULL, Yr, SUM(Sales)
FROM Sales
GROUP BY Yr

The first part of the plan computes the result for the WITH ROLLUP query above.  I described how this query plan works in last week's post.  The second part of this plan computes the missing year sub-totals yielding the entire CUBE result.  Note that this plan employs a common sub-expression spool.  As I discussed in this post, a common sub-expression spool copies its input rows into a worktable and then reads and returns the rows from the worktable multiple times - in this case twice.  The spool is meant to improve performance although, in this example, it has little impact since the server could just as easily have re-read the original Sales table.  However, if the input to the aggregation was more complex and cost more to evaluate, the spool would help.

If we use the WITH CUBE clause when aggregating on more than two columns, SQL Server simply generates increasingly complex plans with additional inputs to the concatentation operator.  As with the simple two column example, the idea is to compute the whole CUBE by computing all of the individual ROLLUPs that compose it.

Finally, we can actually combine WITH CUBE and PIVOT to generate the above table in a single simple statement.  (I actually proposed a variation of this query in an answer to a reader's comment on my post about the PIVOT operator but I like this solution better.)

SELECT EmpId, [2005], [2006], [2007], [ALL]
FROM
      (
      SELECT
            CASE WHEN GROUPING(EmpId) = 0
                  THEN CAST (EmpId AS CHAR(7))
                  ELSE 'ALL'
            END AS EmpId,
            CASE WHEN GROUPING(Yr) = 0
                  THEN CAST (Yr AS CHAR(7))
                  ELSE 'ALL'
            END AS Yr,
            SUM(Sales) AS Sales
      FROM Sales
      GROUP BY EmpId, Yr WITH CUBE
      ) AS s
PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007], [ALL])) AS p

Here is the output from this query:

 EmpId   2005                  2006                  2007                  ALL
------- --------------------- --------------------- --------------------- ---------------------
1       12000.00              18000.00              25000.00              55000.00
2       15000.00              6000.00               NULL                  21000.00
3       NULL                  20000.00              24000.00              44000.00
ALL     27000.00              44000.00              49000.00              120000.00

Comments

  • Anonymous
    October 11, 2007
    In my last two posts, I gave examples of aggregation WITH ROLLUP and CUBE . SQL Server 2008 continues

  • Anonymous
    October 25, 2007
    One of the most important uses of CTEs is to write recursive queries. In fact, CTEs provide the only

  • Anonymous
    August 19, 2010
    Nice post.

  • Anonymous
    November 24, 2010
    please tell how to display pivot tables using graph

  • Anonymous
    November 29, 2010
    You'll need to use a tool such as Excel or Reporting Services to display the results of a query as a graph.  Both tools can link directly to the data stored in SQL Server. HTH, Craig

  • Anonymous
    April 19, 2011
    Thanx a lot again....

  • Anonymous
    November 10, 2013
    It's working fine. But i have adding another one column EmpName that showing problem?

  • Anonymous
    November 11, 2013
    I'm not certain that I understand your question.  You can certainly add other columns to the table and include them in the query.  Remember that any column that you select that is not an argument to an aggregate function must be including in the group by list.  For example: SELECT EmpId, EmpName, Yr, SUM(Sales) AS Sales FROM Sales GROUP BY EmpId, EmpName, Yr WITH CUBE HTH, Craig

  • Anonymous
    January 13, 2015
    Hi Craig, very interesting and useful post and I have the following question: In your example the number or Years [Yr] is known and they are specified in the Select and the Pivot, in my case the number of years is uncertain, because in some cases it can be 10 in other 35 depending on the record set I extract. Is there a way to populate the Years [Yr] dynamically from the result set? Regards

  • Anonymous
    January 14, 2015
    The comment has been removed

  • Anonymous
    January 21, 2015
    The comment has been removed