sqlsql-server

SQL Find consecutive numbers in groups


I have a table similar to the one shown. It contains a list of user ids, the hour value for each hour of the day and an Avail flag to determine if that user is available on that hour.

I need to list all User ids which are available for a number of consecutive hours defined as @n

#####################
# UID # Avail # Hour#
#####################
# 123 #   1   #  0  #
# 123 #   1   #  1  #
# 123 #   0   #  2  #
# 123 #   0   #  3  #
# 123 #   0   #  4  #
# 123 #   1   #  5  #
# 123 #   1   #  6  #
# 123 #   1   #  7  #
# 123 #   1   #  8  #
# 341 #   1   #  0  #
# 341 #   1   #  1  #
# 341 #   0   #  2  #
# 341 #   1   #  3  #
# 341 #   1   #  4  #
# 341 #   0   #  5  #
# 341 #   1   #  6  # 
# 341 #   1   #  7  #
# 341 #   0   #  8  #
######################

This should result in the following output for @n=3

#######
# UID #
#######
# 123 #
#######

I have attempted to use the ROW_NUMBER() over (partition by UID,Avail ORDER BY UID,Hour) to assign a number to each row partitioned by the UID and Whether or not they are flagged as available. However this does not work as the periods of availability may change multiple times a day and the ROW_NUMBER() function was only keeping two counts per user based on the Avail flag.


Solution

  • If you're using SQL Server 2012+ you could using a windowed SUM, but you have to specify the number of rows in the window frame in advance as it won't accept variables so it's not that flexible:

    ;with cte as 
    (
        select distinct 
           UID, 
           SUM(avail) over (partition by uid 
                            order by hour 
                            rows between current row and 2 following
                           ) count 
        from table1
    )
    select uid from cte where count = 3;
    

    If you want flexibility you could make it a stored procedure and use dynamic SQL to build and execute the statement, something like this:

    create procedure testproc (@n int) as
    declare @sql nvarchar(max)
    set @sql = concat('
        ;with cte as 
        (
           select distinct 
              UID, 
              SUM(avail) over (partition by uid 
                            order by hour 
                            rows between current row and ', @n - 1 , ' following
                            ) count 
           from table1
        )
        select uid from cte where count = ' , @n , ';')
    exec sp_executesql @sql
    

    and execute it using execute testproc 3

    An even more inflexible solution is to use correlated subqueries, but then you have to add another subquery for each added count:

    select distinct uid 
    from Table1 t1
    where Avail = 1
      and exists (select 1 from Table1 where Avail = 1 and UID = t1.UID and Hour = t1.Hour + 1)
      and exists (select 1 from Table1 where Avail = 1 and UID = t1.UID and Hour = t1.Hour + 2);
    

    And yet another way, using row_number to find islands and then filtering by sum of avail for each island:

    ;with c as (
        select 
           uid, avail, 
           row_number() over (partition by uid order by hour) 
           - row_number() over (partition by uid, avail order by hour) grp
    from table1
    )
    
    select uid from c
    group by uid, grp
    having sum(avail) >= 3