SQL gaps and island issue

Raj D 591 Reputation points
2025-01-21T05:59:20.4966667+00:00

Greetings!!!

We're working on migrating data from a legacy application and have a requirement to get all the consecutive SESSION START_DATE & END_DATE. For the SESSION_ID - 8642, SESSION_TYPE - 3256, based on the requirement using sql server query for the version 2014 get corresponding records of START_DATE when the date range has gaps when the date range is consecutive and there are multiple records with the same for instance START_DATE '2022-02-17' has 2 records need to get the corresponding records with the latest SESSION_ENTER_DATE. Same with START_DATE '2022-06-02' and '2024-01-02'.

User's image

From the above resultset, I need to exclude START_DATE '2022-03-03' as it would fall in between START_DATE '2022-02-17' & END_DATE '2022-05-12' also START_DATE '2022-06-16' as it would fall in between START_DATE '2022-06-02' & END_DATE '2022-07-27'.

DDL and data and query is below.

Query

Expected Output:

Check for consecutive date range if yes then get the first START_DATE and the corresponding END_DATE, PAID, SESSION_ID, SESSION_TYPE, SESSION_ENTER_DATE. If there's multiple START_DATE for instance START_DATE '2022-02-17' & END_DATE '2022-05-12' get the latest SESSION_ENTER_DATE '2022-07-27' and its corresponding records. Need to exclude START_DATE '2022-03-03' END_DATE '2022-03-09' as it would fall in between START_DATE '2022-02-17' & END_DATE '2022-05-12'

User's image

as START_DATE '2022-06-02' & END_DATE '2022-07-27' has latest SESSION_ENTER_DATE '2022-07-27' so need get the corresponding rest of the column values.User's image

as START_DATE '2024-01-02' & END_DATE '2024-01-07' has latest SESSION_ENTER_DATE '2024-01-19' so need get the corresponding rest of the column values.User's image

The output would look like below.

START_DATE END_DATE PAID SESSION_ID SESSION_TYPE SESSION_ENTER_DATE
2022-02-17 2022-05-12 29.27 8642 3256 2022-07-27
2022-06-02 2022-07-27 48.72 8642 3256 2022-07-27
2023-04-19 2023-04-25 61.19 8642 3256 2023-04-19
2024-01-02 2024-01-07 34.97 8642 3256 2024-01-19
2024-02-22 2024-04-11 18 8642 4963 2024-03-29
2016-04-13 2016-04-26 32.53 7415 3256 2016-04-20
2017-11-11 2017-12-08 43.59 7415 4963 2017-12-05
2018-03-31 2018-04-27 43.59 7415 4963 2018-04-17
2018-06-23 2018-07-20 43.59 7415 4963 2018-07-05
2018-12-09 2018-12-28 37.02 7415 4963 2018-11-29
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,423 questions
SQL Server Transact-SQL
SQL Server Transact-SQL
SQL Server: A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.Transact-SQL: A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
114 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,688 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Erland Sommarskog 116.4K Reputation points MVP
    2025-01-22T22:43:05.1233333+00:00

    I changed the first CTE to read:

    ;WITH RankedSessions AS (
        SELECT 
            START_DATE,
            END_DATE,
            PAID,
            SESSION_ID,
            SESSION_TYPE,
            SESSION_ENTER_DATE,
            ROW_NUMBER() OVER (
                PARTITION BY SESSION_ID, SESSION_TYPE, START_DATE 
                ORDER BY SESSION_ENTER_DATE DESC
            ) AS rn
        FROM [SESSION]  A
        WHERE NOT EXISTS (SELECT * 
                          FROM   SESSION B
                          WHERE  A.SESSION_ID = B.SESSION_ID
                            AND  A.SESSION_TYPE = B.SESSION_TYPE
                            AND  A.START_DATE >= B.START_DATE
                            AND  A.END_DATE <= B.END_DATE
                            AND  NOT (A.START_DATE = B.START_DATE AND A.END_DATE = B.END_DATE))
    ),
    

    This still have one more row than your desired result. But then again, I fail to see why that row should be


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.