T-SQL: Gaps and Islands Problem
This article will consider a simple classical Gaps & Islands problem asked recently in Transact-SQL Forum at MSDN with non-original title "Query Help".
Problem Definition
The thread originator was kind enough to provide DDL of the table and some data to describe the task:
Create table T1
(Id int identity primary key,
VoucherNo varchar(4),
TransNo varchar(10)
)
Insert into T1 values ('V100','Trns1'),('V101','Trns1'),('V102','Trns1'),('V103','Trns1'),('V104','Trns1'),('V106','Trns1')
And he also provided the desired output:
TransNo | FirstVoucher | LastVoucher | Quantity |
Trns1 | V100 | V104 | 5 |
Trns1 | V106 | V106 | 1 |
The problem is to find consecutive vouchers (100-104, then 106).
Solution
As mentioned, this is a common problem in Transact-SQL and it was described by Itzik Ben-Gan here or by Plamen Ratchev in this easy to understand blog post Refactoring Ranges. Knowing the main idea of the solution it is easy to provide it assuming that all voucher numbers come in the following format (letter V following by the 3 digit number):
;WITH cte
AS (
SELECT *
,CAST(SUBSTRING(VoucherNo, 2, 3) AS INT) - ROW_NUMBER() OVER (
ORDER BY VoucherNo
) AS Grp
FROM T1
)
SELECT TransNo
,min(VoucherNo) AS FirstVoucherNo
,max(VoucherNo) AS LastVoucherNo
,count(*) AS Quantity
FROM cte
GROUP BY TransNo
,Grp
So, the idea of this solution is to group consecutive ranges first using ROW_NUMBER() function and then apply aggregate functions based on that group identifier.
Note, it is easy to modify this query to work with different formats of the voucher number (say, some combinations of letters following by any length number). This article is concentrating on the problem posted by the thread originator and is solving it for the particular voucher number format. You may want to see some modifications of my solution suggested by Ronen Ariely in the original thread.
Another Example
I am here adding another example of Gaps and Islands problem that was asked in the T-SQL Forum "find the missing date from output data".
Here is the DDL and sample data.
CREATE TABLE Person
(PersonID INT IDENTITY(1,1) PRIMARY KEY,
PersonName varchar(3),
SigninDate date
);
INSERT INTO Person VALUES
('AAA','20130301'),
('AAA','20130302'),
('AAA','20130305'),
('AAA','20130306'),
('AAA','20130307'),
('BBB','20130301'),
('BBB','20130302'),
('BBB','20130303'),
('BBB','20130305'),
('BBB','20130307');
The task was to find the missing dates in the given data. So here is the expected result.
PersonName | Date |
AAA | 2013-03-03 |
AAA | 2013-03-04 |
BBB | 2013-03-04 |
BBB | 2013-03-06 |
So the first task was to group the data on the 'PersonName', assign a row number to identify each element and then find the gaps in each these groups.
/*
Solution for a *Pre* SQL 2012 version
The first CTE groups the PersonName and assign a group number and
a row identifier for each group
*/
WITH CTE_GROUPS AS
(SELECT *,
DENSE_RANK() OVER (ORDER BY PersonName) AS GroupID,
ROW_NUMBER() OVER(PARTITION BY PersonName ORDER BY SigninDate) RowNum
FROM Person
),
/*
The second CTE will find the border values for the GAPs
*/
CTE_GAPS AS
(SELECT C2.PersonName,
C2.SigninDate GapStart,
C1.SigninDate GapEnd
FROM CTE_GROUPS C1 INNER JOIN CTE_GROUPS C2 ON
C1.GroupID=C2.GroupID and C1.RowNum=C2.RowNum+1
AND ISNULL(DATEDIFF(DAY,C2.Signindate,C1.SigninDate),0)>1
)
SELECT * FROM CTE_GAPS
Here is how you could do the same in SQL 2012 using LAG function
WITH CTE_GAPS AS
(SELECT Personname,
LAG(P.SigninDate) Over(Partition by P.PersonName Order By P.PersonName,p.Signindate) GapStart ,
SigninDate GapEnd,
(DATEDIFF(DAY,LAG(P.SigninDate) Over(Partition by P.PersonName Order By P.PersonName,p.Signindate),P.SigninDate)-1) GapDays
FROM Person P)
SELECT P.PersonName,
C.CalendarDate
FROM MASTER..Calendar C CROSS JOIN CTE_GAPS P
WHERE
GapDays>0 AND
C.CalendarDate
BETWEEN DATEADD(DAY,1,P.GapStart ) AND DATEADD(DAY,-1,P.GapEnd)
See Also
- Transact-SQL Portal
- Group Islands of Contiguous Dates (SQL Spackle) - interesting article by Jeff Moden
- [[T-SQL Useful Links]]
This entry participated in the Technology Guru TechNet WiKi for July contest and won the gold prize.