Summarizing Data Using CUBE
The CUBE operator generates a result set that is a multidimensional cube. A multidimensional cube is an expansion of fact data, or data that records individual events. The expansion is based on columns that the user wants to analyze. These columns are called dimensions. The cube is a result set that contains a cross tabulation of all the possible combinations of the dimensions.
The CUBE operator is specified in the GROUP BY clause of a SELECT statement. The select list contains the dimension columns and aggregate function expressions. The GROUP BY specifies the dimension columns and the keywords WITH CUBE. The result set contains all possible combinations of the values in the dimension columns, together with the aggregate values from the underlying rows that match that combination of dimension values.
For example, a simple table Inventory contains the following:
Item Color Quantity
-------------------- -------------------- --------------------------
Table Blue 124
Table Red 223
Chair Blue 101
Chair Red 210
The following query returns a result set that contains the Quantity
subtotal for all possible combinations of Item
and Color
:
SELECT Item, Color, SUM(Quantity) AS QtySum
FROM Inventory
GROUP BY Item, Color WITH CUBE
Here is the result set.
Item Color QtySum
-------------------- -------------------- --------------------------
Chair Blue 101.00
Chair Red 210.00
Chair (null) 311.00
Table Blue 124.00
Table Red 223.00
Table (null) 347.00
(null) (null) 658.00
(null) Blue 225.00
(null) Red 433.00
The following rows from the result set are of special interest:
Chair (null) 311.00
This row reports a subtotal for all rows having the value Chair
in the Item
dimension. The value null
is returned for the Color
dimension to show that aggregate reported by the row includes rows with any value of the Color
dimension.
Table (null) 347.00
This row is similar, but reports the subtotal for all rows having Table
in the Item
dimension.
(null) (null) 658.00
This row reports the grand total for the cube. Both the Item
and Color
dimensions have the value null
. This shows that all values of both dimensions are summarized in the row.
(null) Blue 225.00
(null) Red 433.00
These two rows report the subtotals for the Color
dimension. Both have null
in the Item
dimension to show that the aggregate data came from rows having any value for the Item
dimension.
Using GROUPING to Distinguish Null Values
The null values generated by the CUBE operation present a problem: How can a NULL generated by the CUBE operation be distinguished from a NULL returned in the actual data? This is achieved by using the GROUPING function. The GROUPING function returns 0 if the column value came from the fact data, and 1 if the column value is a NULL generated by the CUBE operation. In a CUBE operation, a generated NULL represents all values. The SELECT statement can be written to use the GROUPING function to substitute the string ALL for any generated NULL. Because a NULL from the fact data indicates the data value is unknown, the SELECT can also be coded to return the string UNKNOWN for any NULL from the fact data. For example:
SELECT CASE WHEN (GROUPING(Item) = 1) THEN 'ALL'
ELSE ISNULL(Item, 'UNKNOWN')
END AS Item,
CASE WHEN (GROUPING(Color) = 1) THEN 'ALL'
ELSE ISNULL(Color, 'UNKNOWN')
END AS Color,
SUM(Quantity) AS QtySum
FROM Inventory
GROUP BY Item, Color WITH CUBE
Multidimensional Cubes
The CUBE operator can be used to generate n-dimensional cubes, or cubes with any number of dimensions. A single dimension cube can be used to generate a total, for example:
SELECT CASE WHEN (GROUPING(Item) = 1) THEN 'ALL'
ELSE ISNULL(Item, 'UNKNOWN')
END AS Item,
SUM(Quantity) AS QtySum
FROM Inventory
GROUP BY Item WITH CUBE
GO
This SELECT
statement returns a result set that shows both the subtotals for each value of Item
and the grand total for all values of Item
:
Item QtySum
-------------------- --------------------------
Chair 311.00
Table 347.00
ALL 658.00
SELECT statements that contain a CUBE with many dimensions can generate large result sets, because these statements generate rows for all combinations of the values in all the dimensions. These large result sets may contain too much data to be easily read and understood. One solution to this problem is to put the SELECT
statement into a view:
CREATE VIEW InvCube AS
SELECT CASE WHEN (GROUPING(Item) = 1) THEN 'ALL'
ELSE ISNULL(Item, 'UNKNOWN')
END AS Item,
CASE WHEN (GROUPING(Color) = 1) THEN 'ALL'
ELSE ISNULL(Color, 'UNKNOWN')
END AS Color,
SUM(Quantity) AS QtySum
FROM Inventory
GROUP BY Item, Color WITH CUBE
The view can then be used to query only the dimension values of interest:
SELECT *
FROM InvCube
WHERE Item = 'Chair'
AND Color = 'ALL'
Item Color QtySum
-------------------- -------------------- --------------------------
Chair ALL 311.00
(1 row(s) affected)