I have the following data:
date unit status
2023-04-30 unit1 1
2023-05-31 unit1 1
2023-08-31 unit1 1
2023-09-30 unit1 1
2023-11-30 unit1 1
2023-12-31 unit1 1
2024-01-31 unit1 1
2024-02-28 unit1 1
For a reference date I would like to know the length of the first upcoming "streak" (on MSSQL, used for production, and sqlite, used for unit tests)
For date 2023-05-15 my desired output is:
unit streak
unit1 3
The reason for this is that the first month with status=1 after 2023-05 is 2023-08, and then I just count for each consecutive month.
For date 2023-11-01 my desired output is:
unit streak
unit1 3
The reason is that the first month with status=1 after 2023-11 is 2023-12, and the streak ends on 2024-02 as months with status=0 are not recorded, and the next month with status=1 is more that a month away.
This is a gaps and islands
problem that can be resolved by (value minus row_number), since that is invariant within a consecutive sequence. The start and end dates are just the MIN() and MAX() of the group :
WITH cte as (
SELECT *, GroupingSet = FORMAT(DATEADD(
MONTH, - ROW_NUMBER() OVER(PARTITION BY unit ORDER BY [date]),
[date]
), 'yyyy-MM-01')
FROM mytable
WHERE [date] > EOMONTH('2023-05-15') AND [status] = 1
)
SELECT TOP 1 unit,
StartDate = MIN([date]),
EndDate = MAX([date]),
streak = COUNT(*)
FROM CTE
GROUP BY unit, GroupingSet
ORDER BY StartDate;
NB : The giving date have been converted to the first day of the month so the GroupingSet can be matched withing same year/month !
———
Using the row_number method, we can easily obtain the top streak for each unit when considering several units:
with cte AS (
SELECT *, GroupingSet = FORMAT(DATEADD(
MONTH, - ROW_NUMBER() OVER(PARTITION BY unit ORDER BY [date]),
[date]
), 'yyyy-MM-01')
FROM mytable
WHERE [date] > EOMONTH('2023-05-15') AND [status] = 1
),
cte2 AS (
SELECT unit,
StartDate = MIN([date]),
EndDate = MAX([date]),
streak = COUNT(*)
FROM CTE
GROUP BY unit, GroupingSet
),
cte3 as (
SELECT *, row_number() over (partition by unit order by streak desc) as rn
FROM cte2
)
SELECT unit, StartDate, EndDate, streak
FROM cte3
WHERE rn = 1