sqlsql-serversqlitewindow-functionsconsecutive-months

How do I calculate length of streak?


I have the following data:

date          unit     status
2023-04-30    unit1    1
2023-05-31    unit1    1
2023-08-31    unit1    1
2023-09-30    unit1    1
2023-11-30    unit1    1
2023-12-31    unit1    1
2024-01-31    unit1    1
2024-02-28    unit1    1

For a reference date I would like to know the length of the first upcoming "streak" (on MSSQL, used for production, and sqlite, used for unit tests)

Example 1:

For date 2023-05-15 my desired output is:

unit     streak
unit1    3

The reason for this is that the first month with status=1 after 2023-05 is 2023-08, and then I just count for each consecutive month.

Example 2:

For date 2023-11-01 my desired output is:

unit     streak
unit1    3

The reason is that the first month with status=1 after 2023-11 is 2023-12, and the streak ends on 2024-02 as months with status=0 are not recorded, and the next month with status=1 is more that a month away.


Solution

  • This is a gaps and islands problem that can be resolved by (value minus row_number), since that is invariant within a consecutive sequence. The start and end dates are just the MIN() and MAX() of the group :

    WITH cte as (
      SELECT  *, GroupingSet = FORMAT(DATEADD(
                                 MONTH, - ROW_NUMBER() OVER(PARTITION BY unit ORDER BY [date]), 
                                 [date]
                               ), 'yyyy-MM-01')
      FROM    mytable
      WHERE [date] > EOMONTH('2023-05-15') AND [status] = 1
    )
    SELECT  TOP 1 unit,
            StartDate = MIN([date]),
            EndDate = MAX([date]),
            streak = COUNT(*)
    FROM    CTE
    GROUP BY unit, GroupingSet
    ORDER BY StartDate;
    

    NB : The giving date have been converted to the first day of the month so the GroupingSet can be matched withing same year/month !

    Demo here

    ———

    Using the row_number method, we can easily obtain the top streak for each unit when considering several units:

    with cte AS (
      SELECT  *, GroupingSet = FORMAT(DATEADD(
                                 MONTH, - ROW_NUMBER() OVER(PARTITION BY unit ORDER BY [date]), 
                                 [date]
                               ), 'yyyy-MM-01')
      FROM    mytable
      WHERE [date] > EOMONTH('2023-05-15') AND [status] = 1
    ),
    cte2 AS (
      SELECT  unit,
            StartDate = MIN([date]),
            EndDate = MAX([date]),
            streak = COUNT(*)
      FROM    CTE
      GROUP BY unit, GroupingSet
    ),
    cte3 as (
      SELECT *, row_number() over (partition by unit order by streak desc) as rn
      FROM cte2
    )
    SELECT unit, StartDate, EndDate, streak
    FROM cte3
    WHERE rn = 1
    

    Demo here