T-SQL: Applying APPLY Operator
This article originates from the following MSDN Transact SQL Forum's question: Complex logic to be implemented in SQL - Please help! and I hope I made a pun with its title.
In my solution to the problem presented by the thread's originator I am going to show how to use OUTER APPLY operator to solve common problems.
Problem Description
The problem to be solved was the following:
Given this table:
CREATE TABLE Enrollments (
StudentId INT NOT NULL
,Enroll_Date DATE NOT NULL
,Class VARCHAR(30) NOT NULL
)
ALTER TABLE Enrollments ADD CONSTRAINT PK_Enrollments_StudentID_Enroll_Date PRIMARY KEY (
StudentId
,Enroll_Date
)
INSERT INTO Enrollments (
StudentId
,Enroll_Date
,Class
)
VALUES (
1001
,'20130101'
,'Dance'
)
,(
1001
,'20130401'
,'Swimming'
)
,(
1001
,'20130601'
,'Karate'
)
We would need to produce the following output:
Solution
The first idea that comes to mind is that since we would need to expand ranges of dates we would need a Calendar table with all the months. There are many common date related queries scenarios that benefit from the permanent Calendar table in each database, as well as a Numbers table. You may want to check this excellent article explaining why it is important to have such a Calendar table: Why should I consider a Calendar table? For this particular problem we only need to have one row per each month, so we can either generate such a table on the fly or select from our existing Calendar table. While working on this article I discovered that the database I used to create the Enrollments table didn't have a permanent Calendar table, so I used this quick script to generate it for the purpose of solving the original problem:
IF OBJECT_ID('tempdb..#Tally', N'U') IS NOT NULL DROP TABLE #Tally;
SELECT TOP 2000000 IDENTITY(INT, 1, 1) AS N
INTO #Tally
FROM Master.dbo.SysColumns sc1
,Master.dbo.SysColumns sc2
CREATE UNIQUE CLUSTERED INDEX cx_Tally_N ON #Tally (N);
SELECT CAST(dateadd(month, N-1, '19000101') AS DATE) AS the_date
INTO dbo.Calendar
FROM #Tally T
WHERE N <= datediff(month, '19000101', '20200101');
So with that script we prepared the Calendar table with one row per each month from 01/01/1900 till 01/12/2019.
With that table in place I can now proceed with solving the problem we wanted to solve.
We need to create the start and end date for each enrollment and then join with the Calendar table to expand ranges. The start date is obviously the enrollment date and the end date is either the date one month prior to the next enrollment date for that Student or the first day of the current month. Therefore I used the obvious idea here which I have used many times in the past for the similar kind of the problems:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,COALESCE(DATEADD(month, - 1, N.Enroll_Date), DATEADD(month, DATEDIFF(month, '19000101', CURRENT_TIMESTAMP), '19000101')) AS End_Date
,S.Class
FROM Enrollments S
OUTER APPLY (
SELECT TOP (1) Enroll_Date
FROM Enrollments E
WHERE E.StudentId = S.StudentId
AND E.Enroll_Date > S.Enroll_Date
ORDER BY Enroll_Date
) N)
SELECT *
FROM cte;
I've added SELECT * FROM cte so we can examine our intermediate result and verify that it is correct logic.
Now we only need to add a JOIN to Calendar table to get the desired result with expanded ranges:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,COALESCE(DATEADD(month, - 1, N.Enroll_Date), DATEADD(month, DATEDIFF(month, '19000101', CURRENT_TIMESTAMP), '19000101')) AS End_Date
,S.Class
FROM Enrollments S
OUTER APPLY (
SELECT TOP (1) Enroll_Date
FROM Enrollments E
WHERE E.StudentId = S.StudentId
AND E.Enroll_Date > S.Enroll_Date
ORDER BY Enroll_Date
) N)
SELECT S.StudentId, Cal.the_date AS Enroll_Date, S.Class
FROM cte S INNER JOIN dbo.Calendar Cal ON Cal.the_date BETWEEN S.Start_Date AND S.End_Date;
SQL Server 2012 Solution
SQL Server 2012 and up offers a simpler alternative to the OUTER APPLY solution. In SQL Server 2012 the LEAD() and LAG() functions were introduced that allow us to avoid correlated subquery and transform that solution into this code:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,DATEADD(month, -1,LEAD(S.Enroll_Date, 1, DATEADD(day, 1, EOMONTH(CURRENT_TIMESTAMP))) OVER (PARTITION BY S.StudentId ORDER BY S.Enroll_Date)) AS End_Date
,S.Class
FROM Enrollments S
)
SELECT S.StudentId, Cal.the_date AS Enroll_Date, S.Class
FROM cte S INNER JOIN dbo.Calendar Cal ON Cal.the_date BETWEEN S.Start_Date AND S.End_Date;
In this solution I also used the new EOMONTH() function in order to advance one month from the current month for the default value in the LEAD function. Then we're subtracting one month from that expression as a whole.
Conclusion
In this article we learned how to apply simple T-SQL tricks to solve a problem. We learned 2 solutions - one which only works in SQL Server 2012 and above and another solution that may be used in prior versions of SQL Server.
Addendum
Today's Transact SQL Server MSDN Forum post "Dynamic Columns with some additional logic" is an interesting continuation of this article theme and also my other T-SQL: Dynamic Pivot on Multiple Columns article. In my reply to the thread's originator I hinted the possible solution using the ideas from both articles. Please leave a comment to this article if you want that case to become a new article or part of this article.
See you soon again with more interesting problems!
See Also
This article participated in the TechNet Guru for August competition and won the Silver Medal.