Udostępnij za pośrednictwem


More Fun with SQL Server “Denali” Window Functions

In my last post I described scenarios where the improvements to the SQL Server “Denali” windows functions could address a common scenario of getting year to date sales from a list of monthly amounts.

In this post we will explore a couple of other advances in the Windows functions and how they might be applied to other scenarios.

The first scenario uses the AdvnetureWorksDWDenali database located at https://msftdbprodsamples.codeplex.com/releases/view/55330

In this example, let’s look at the Sales.InternetSales table. Some common reporting requests might be:

1) What is the year to date sales by month for a given sales territory?
2) What were the average sales over the last 3 and 6 months shown as moving averages?
3) What is a given month’s sales as a percentage of the year’s total sales?
4) What is the year to date wales by month as a percentage of the tear’s total sales?
5) What is a given sales territory’s sales as a percentage for all sales in a given month?

Now, prior to the advancements in the window functions, some of these queries would be extremely difficult to do against a reporting database in an efficient manner. Chances are you would need to resort to temporary tables, creating work tables, scanning the base table multiple times, or bringing huge amounts of data into a client app where you could process the data more efficiently but incur large transfer times to move the data to your application tier.

So, what does this look like in SQL Server “Denali”?

WITH Sales_CTE AS (
SELECT SalesTerritoryKey
, OrderYear = YEAR(orderdate)
, OrderMonth = MONTH(orderdate)
, SalesAmount = SUM(SalesAmount)
FROM dbo.FactInternetSales
WHERE CurrencyKey = 100
GROUP BY SalesTerritoryKey, year(orderdate), month(orderdate)
)
SELECT SalesTerritoryKey
, OrderYear
, OrderMonth
, SalesAmount
, YTDSales = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
ORDER BY OrderYear, OrderMonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
, Last3MonthAverageSales = Avg(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey
ORDER BY OrderYear, OrderMonth
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
, Last6MonthAverageSales = Avg(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey
ORDER BY OrderYear, OrderMonth
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
)
, CurrentMonthPctOfYear = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
ORDER BY OrderYear, OrderMonth
ROWS CURRENT ROW
)
/ sum(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
)
, YTDPCTOfYear = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
ORDER BY OrderYear, OrderMonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
/ sum(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
)
, SalesTerritoryAsPctOfMonthlyTotal = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey
ORDER BY OrderYear, OrderMonth
ROWS CURRENT ROW
)
/ sum(SalesAmount)
OVER (
PARTITION BY OrderYear, OrderMonth
)
FROM Sales_CTE
ORDER BY OrderYear, OrdermOnth, SalesTerritoryKey

Let’s break this statement apart…..

In the first section we create a common table expression that gives us a view of the orders grouped by year, month and sales territory:

WITH Sales_CTE AS (
SELECT SalesTerritoryKey
, OrderYear = YEAR(orderdate)
, OrderMonth = MONTH(orderdate)
, SalesAmount = SUM(SalesAmount)
FROM dbo.FactInternetSales
WHERE CurrencyKey = 100
GROUP BY SalesTerritoryKey, year(orderdate), month(orderdate)
)

This could have been accomplished with a derived table or a view… I have gotten in the habit of using common table expressions as this is the way that I think about the problem - first, group the data; then, query the data.

SELECT SalesTerritoryKey
, OrderYear
, OrderMonth
, SalesAmount



FROM Sales_CTE
ORDER BY SalesTerritoryKey, OrderYear, OrdermOnth

Here I am just writing the bulk of the query… leaving out the fun bits so that we can talk about them 1 by 1.
1) Year to Date Sales by month and territory.

, YTDSales = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
ORDER BY OrderMonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
In this example we are creating a set of windows on the data defined by SalesTerritoryKey and OrderYear. Within this window we will order the data month. What we present to the aggregate function is the set or rows in that window from the first row and the row that is being evaluated.

2) 3 and 6 Month Moving Average

, Last3MonthAverageSales = Avg(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey
ORDER BY OrderYear, OrderMonth
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
, Last6MonthAverageSales = Avg(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey
ORDER BY OrderYear, OrderMonth
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
)
In this case we would like to present a different window to the aggregate. In Year to Date calculations we want to create a window bounded by the order year, in this case, we would like to create a larger window that encompasses all of the order for the sales territory. We order that data by year and month and then return the current row + 2 or 5 months prior depending on which average we are calculating.

3 and 4) Current Month and YTD Sales as a Percentage of Total Year’s Sales

, CurrentMonthPctOfYear = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
ORDER BY OrderYear, OrderMonth
ROWS CURRENT ROW
)
/ SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
)
, YTDPCTOfYear = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
ORDER BY OrderYear, OrderMonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
/ SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey, OrderYear
)
In this case you can see that the numerator and the denominator of these calculations are both sums of sales amounts just over different windows.

5) Sales Territory Sales as a Pertcentage of the Current Month’s sales

, SalesTerritoryAsPctOfMonthlyTotal = SUM(SalesAmount)
OVER (
PARTITION BY SalesTerritoryKey
ORDER BY OrderYear, OrderMonth
ROWS CURRENT ROW
)
/ sum(SalesAmount)
OVER (
PARTITION BY OrderYear, OrderMonth
)
This calculation acts orthogonally to the other calculations in the list! All the other calculations looked at the calculations within a sales territory, this looks at all of the sales territories. Notice that the denominator of this calculation partitions by OrderYear and OrderMonth while all the other functions partition by SalesTerritoryKey.

This is great for running totals which work for sales – but what about account balances? How do I find the opening or closing balance of a period? What if I would like to see year to date growth based on current balance compared to the year’s opening balance? For this, we will shift to the dbo.FactFinance table that tracks account balances for AdventureWorks.

In this query we will look at the AdventureWorks Northeast Organization’s Cash Account Balances.

SELECT Date
, Amount
, YTDGrowth = (Amount
/ FIRST_VALUE(Amount)
                              OVER (
PARTITION BY YEAR(Date)
ORDER BY MONTH(Date)
)
) - 1
, Last3MonthGrowth = (Amount
/ FIRST_VALUE(Amount)
OVER (
PARTITION BY AccountKey
ORDER BY YEAR(Date), MONTH(Date)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
) - 1
, Last6MonthGrowth = (Amount
/ FIRST_VALUE(Amount)
OVER (
PARTITION BY AccountKey
ORDER BY YEAr(Date), MONTH(Date)
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
)
) - 1
FROM dbo.FactFinance
WHERE AccountKey = 4
AND OrganizationKey = 3

Here we use the FIRST_VALUE function to obtain the first cash balance for the year (in year to date growth) or the first balance in a window of 3 or 6 months for the other calculations.

There are other new window functions lag, lead and last_value that we did not cover today, but I hope that you start to see the patterns around how to use these functions and the general usage pattern to apply them to the problems you face today.