I have test data like the following:
DROP TABLE IF EXISTS #test;
CREATE TABLE #test (id INT, start_date DATE, end_date DATE)
INSERT INTO
#test
VALUES
(1, '2023-01-01', '2024-01-01'),
(1, '2023-05-01', '2024-07-01'),
(1, '2025-01-01', '2026-01-01');
SELECT *
FROM #test;
I am trying to create integer groups that check if the prior row connects with the current. If it doesn't connect, I want to increment the integer to notate that its a separate group.
I have tried below to check if the current row connects with the prior based on if the current period start_date
comes before the prior periods end date. this would signal that the time periods connect and be treated as the same group.
WITH connects AS (
SELECT
id,
CASE
WHEN LAG(id, 1) OVER (PARTITION BY id ORDER BY start_date) = id AND (
start_date <= LAG(end_date, 1) OVER (PARTITION BY id ORDER BY start_date)
OR
end_date <= LAG(end_date, 1) OVER (PARTITION BY id ORDER BY start_date)
) THEN 1
WHEN LAG(id, 1) OVER (PARTITION BY id ORDER BY start_date) IS NULL THEN 1
ELSE 0
END AS connects_flag
FROM
#test
)
SELECT
*
FROM
connects;
Desired outcome:
id | start_date | end_date | grp |
---|---|---|---|
1 | 2023-01-01 | 2024-01-01 | 1 |
1 | 2023-05-01 | 2024-07-01 | 1 |
1 | 2025-01-01 | 2026-01-01 | 2 |
Here's one way to do it to with cumulative aggregation:
In the common table expression I've used a combination of lag
and iif
to create a column that will contain 1
whenever the previous row's end date is either missing or smaller than the current row's start date, and 0
otherwise (so 0 when the previous row overlaps with the current, 1 if it doesn't).
Then all that's left to do is to commulatively sum the values of that column.
I've also added some more rows to your sample table:
DROP TABLE IF EXISTS #test;
CREATE TABLE #test (id INT, start_date DATE, end_date DATE)
INSERT INTO
#test
VALUES
(1, '2023-01-01', '2024-01-01'),
(1, '2023-05-01', '2024-07-01'),
(1, '2024-05-01', '2024-07-01'),
(1, '2024-08-01', '2024-09-01'),
(1, '2025-01-01', '2026-01-01'),
(1, '2026-05-01', '2026-07-01');
WITH CTE AS
(
SELECT id
, start_date
, end_date
, IIF(
ISNULL(
LAG(end_date) OVER(PARTITION BY id ORDER BY start_date)
, '0001-01-01') < start_date
, 1, 0) As grp
FROM #test
)
SELECT id
, start_date
, end_date
, SUM(grp) OVER(PARTITION BY id ORDER BY start_date) DateGroup
FROM CTE;
Results:
id | start_date | end_date | DateGroup |
---|---|---|---|
1 | 2023-01-01 | 2024-01-01 | 1 |
1 | 2023-05-01 | 2024-07-01 | 1 |
1 | 2024-05-01 | 2024-07-01 | 1 |
1 | 2024-08-01 | 2024-09-01 | 2 |
1 | 2025-01-01 | 2026-01-01 | 3 |
1 | 2026-05-01 | 2026-07-01 | 4 |
You can see a live demo on db<>fiddle