共用方式為


Re-summarizing data using UNPIVOT, ROW_NUMBER and CTEs

I had a question recently where a customer was presented a set of tables summarized by month and he was asked to re-summarize the data giving year to date summaries.

This was a “one time” request so we wanted to get results quickly and then refine the process once the customer decided how/if they wanted to report on the data more frequently. In my quest to eliminate cursors, loops and temp tables form my set-based life I wanted to present a solution that was contained in one SQL Query. As we will see below, there are some drawbacks to my approach, but I think this is a good illustration of using common table expressions, ranking functions and the UNPIVOT operator to solve a fairly common problem.

The structure of the source table was similar to:

 CREATE TABLE dbo.MonthlyData (
    VendorId int
    , Jan int
    , Feb int
    , Mar int
    , Apr int
    , May int
    , Jun int
    , Jul int
    , Aug int
    , Sep int
    , Oct int
    , Nov int
    , Dec int
    CONSTRAINT pkc_MonthlyData PRIMARY KEY CLUSTERED (VendorId)
    );

Let’s populate some sample data:

 SET NOCOUNT ON
DECLARE @i INT = 0
WHILE @i < 10000
BEGIN
    INSERT INTO dbo.MonthlyData VALUES (@i, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
    SELECT @i = @i + 1    
    
END
GO

The first step is to break out the data by month and vendor. We can use the UNPIVOT operator to flip the month’s columns to rows.

 SELECT *
  FROM (
SELECT VendorId
            , _01_Jan = Jan
            , _02_Feb = Feb
            , _03_Mar = Mar
            , _04_Apr = Apr
            , _05_May = May
            , _06_Jun = Jun
            , _07_Jul = Jul
            , _08_Aug = Aug
            , _09_Sep = Sep
            , _10_Oct = Oct
            , _11_Nov = Nov
            , _12_Dec = Dec
    FROM dbo.MonthlyData
    ) AS p
UNPIVOT
    (Amount FOR MonthAbbr IN 
        (_01_Jan, _02_Feb, _03_Mar, _04_Apr, _05_May, _06_Jun
         , _07_Jul, _08_Aug, _09_Sep, _10_Oct, _11_Nov, _12_Dec
         ) 
    ) a
        

We have prefixed each month name with a sortable number so that we can order the results when we calculate our totals. The result of the above looks like:

VendorId Amount MonthAbbr
1 1 _01_Jan
2 1 _02_Feb
3 1 _03_Mar
4 1 _04_Apr

We can now use this query as the input to a common table expression:

 WITH unpivoted_cte as (
    SELECT VendorId
            , MonthAbbr
            , Amount 
            , MonthId = row_number() over(partition by VendorId order by MonthAbbr)
    FROM 
        (SELECT VendorId
                , _01_Jan = Jan
                , _02_Feb = Feb
                , _03_Mar = Mar
                , _04_Apr = Apr
                , _05_May = May
                , _06_Jun = Jun
                , _07_Jul = Jul
                , _08_Aug = Aug
                , _09_Sep = Sep
                , _10_Oct = Oct
                , _11_Nov = Nov
                , _12_Dec = Dec
        FROM dbo.MonthlyData) AS p
    UNPIVOT
        (Amount FOR MonthAbbr IN 
            (_01_Jan, _02_Feb, _03_Mar, _04_Apr, _05_May, _06_Jun
             , _07_Jul, _08_Aug, _09_Sep, _10_Oct, _11_Nov, _12_Dec
         )
    )AS unpvt
    )
SELECT base.MonthId
        , base.VendorId
        , base.Amount
  FROM unpivoted_cte base

The MonthAbbr is converted to an ID of 1-12.

Now we can calculate a year to date and a rolling three month calculation for a given vendor…

 

 WITH unpivoted_cte as (
    SELECT VendorId
            , MonthAbbr
            , Amount 
            , MonthId = row_number() over(partition by VendorId order by MonthAbbr)
    FROM 
        (SELECT VendorId
                , _01_Jan = Jan
                , _02_Feb = Feb
                , _03_Mar = Mar
                , _04_Apr = Apr
                , _05_May = May
                , _06_Jun = Jun
                , _07_Jul = Jul
                , _08_Aug = Aug
                , _09_Sep = Sep
                , _10_Oct = Oct
                , _11_Nov = Nov
                , _12_Dec = Dec
        FROM dbo.MonthlyData) AS p
    UNPIVOT
        (Amount FOR MonthAbbr IN 
            (_01_Jan, _02_Feb, _03_Mar, _04_Apr, _05_May, _06_Jun
             , _07_Jul, _08_Aug, _09_Sep, _10_Oct, _11_Nov, _12_Dec
         )
    )AS unpvt
    )
SELECT base.MonthId
        , base.VendorId
        , base.Amount
        , YTD = ytd.Amount
        , Rolling3 = rolling3.Amount
  FROM unpivoted_cte base
  LEFT JOIN (
        SELECT base.VendorId
                , base.MonthId 
                , Amount = sum(compare.amount)
          FROM unpivoted_cte base
          JOIN unpivoted_cte compare
            ON base.vendorId = compare.vendorId
           AND base.monthId >=     compare.monthId
         GROUP BY base.VendorId
                    , base.MonthId
        ) ytd
    ON base.VendorId = ytd.VendorId
   AND base.MonthId = ytd.MonthId
  LEFT JOIN (
        SELECT base.VendorId
                , base.MonthId 
                , Amount = sum(compare.amount)
          FROM unpivoted_cte base
          JOIN unpivoted_cte compare
            ON base.vendorId = compare.vendorId
           AND compare.monthId between base.MonthId - 2 and base.MonthId
         GROUP BY base.VendorId
                    , base.MonthId
        ) rolling3
    ON base.VendorId = rolling3.VendorId
   AND base.MonthId = rolling3.MonthId
 WHERE base.VendorId = 999
 ORDER BY base.VendorId, base.MonthId

GO

Here we add two left joins that each access the CTE. In the case of YTD we have:

  LEFT JOIN (
        SELECT base.VendorId
                , base.MonthId 
                , Amount = sum(compare.amount)
          FROM unpivoted_cte base
          JOIN unpivoted_cte compare
            ON base.vendorId = compare.vendorId
           AND base.monthId >=     compare.monthId
         GROUP BY base.VendorId
                    , base.MonthId
        ) ytd
    ON base.VendorId = ytd.VendorId
   AND base.MonthId = ytd.MonthId

We use the CTE as a base and join the CTE to itself where the “compare” month is less than the base month. So, for MondId 4 in the “base” we will return months 1, 2, 3 and 4. We then sum the amounts grouping by vendor and base month.

The result of this query is:

MonthId VendorId Amount YTD Rolling3
1 999 1 1 1
2 999 1 2 2
3 999 1 3 3
4 999 1 4 3

There are a couple of open issues with this approach:

  • We are assuming that there is only one per vendor. If we have more than one year we would need to modify the unpivot and use the year in the rolling 3 month and the YTD calculations.

  • This is great for a one time re-summarization; however, there are other approaches including using SSIS to unpivot if you are loading data from a flat file and breaking the query into multiple steps and populating a work table that holds the unpivoted data.

In my opinion, if you find yourself performing these types of queries in the context of an application or reporting system and not just for one time data analysis then you should look closely at your data structures and update your schema to give you access to the data in a more usable form. That said, this may be a good stop gap as you are re-architecting an existing solution or you have encountered a solution where you cannot change these data structures.

Comments

  • Anonymous
    July 13, 2011
    Awesome-- This is useful in every financial and accounting applicaiton