sqlsql-servergaps-and-islandsazure-sql-managed-instance

sql: how to find consecutive dates where condition is met in aggregate


I am working in Azure SQL MI. I am struggling with a certain SQL query. I have a "timesheet" table like the following:

EmployeeID    DateWorked    LaborType    Hours
----------------------------------------------
1             2024-01-01    work         13
1             2024-01-02    work         12
1             2024-01-03    no work      24
1             2024-01-04    work         8
2             2024-01-01    no work      24
2             2024-01-02    work         11
2             2024-01-03    work         8
2             2024-01-04    work         13

Every employee will have 1 and only 1 row for every date. The "no work" hours will always be 24 hours.

We have a rule where an employee is not allowed to work more than 24 hours across 2 consecutive days. I need to find all instances of this rule violation. For example, employee 1 violates the rule on dates 2024-01-01 and 2024-01-02 because he worked 25 (13 + 12) hours. He does not violate the rule on dates 2024-01-02 and 2024-01-03 because the 2024-01-03 row is "no work". Similarly, employee 2 has no rule violations.

So, I expect output like the following:

EmployeeID     ViolationDates            TotalHoursWorked
-------------------------------------------------------
1              2024-01-01, 2024-01-02        25

How can I write a query to return this output?

If it helps, below is some T-SQL for the sample data.

CREATE TABLE dbo.timesheet_mockup
    (
        EmployeeID int
        ,DateWorked date
        ,LaborType varchar(7)
        ,Hours int
    )
    ;

INSERT INTO
    dbo.timesheet_mockup
VALUES
    (1, '2024-01-01', 'work', 13)
    ,(1, '2024-01-02', 'work', 12)
    ,(1, '2024-01-03', 'no work', 24)
    ,(1, '2024-01-04', 'work', 8)
    ,(2, '2024-01-01', 'no work', 24)
    ,(2, '2024-01-02', 'work', 11)
    ,(2, '2024-01-03', 'work', 8)
    ,(2, '2024-01-04', 'work', 13)
;

Solution

  • select *, t1.Hours + t2.Hours as TotalHoursWorked
    from dbo.timesheet_mockup t1 inner join dbo.timesheet_mockup t2
        on t2.EmployeeID = t1.EmployeeID
            and t2.DateWorked = dateadd(day, -1, t1.DateWorked)
            and t1.LaborType = 'work' and t2.LaborType = 'work'
    where t1.Hours + t2.Hours > 24;
    

    You could generalize this across multiple rows very readily with a cross apply:

    select *
    from dbo.timesheet_mockup t1 cross apply (
        select sum(Hours) as TotalHoursWorked
        from dbo.timesheet_mockup t2
        where   t2.EmployeeID = t1.EmployeeID
            and t2.DateWorked between dateadd(day, -DAYS, t1.DateWorked) and t1.DateWorked
            and t2.LaborType = 'work'
    ) as lookback
    where t1.LaborType = 'work' and lookback.TotalHoursWorked > MAXHOURS;
    

    sum() over (... range between) would work except that SQL Server doesn't support this window with range. You could still filter the non-work days in the summation but it does assume there are no gaps in the data:

    with data as (
        select *,
            sum(case when LaborType = 'work' then Hours else 0 end) over (
                partition by EmployeeID order by DateWorked
                rows between 1 preceding and current row) as TotalHoursWorked
        from dbo.timesheet_mockup
    )
    select * from data where TotalHoursWorked > 24;
    

    Eventually this range query will work on SQL Server too: https://dbfiddle.uk/W8boAC2t (Postgres example)

    And then there's always the simple option to compare lead()/lag() across adjacent rows and all the relevant columns. (Left as an exercise.)

    To guarantee that worked dates are consecutive you could gather the number of non-work hours and then ensure the total was zero:

    select EmployeeID, DateWorked as LastDateWorked, TotalHoursWorked
    from dbo.timesheet_mockup t1 cross apply (
        select
            sum(case when t2.LaborType = 'work'    then Hours else 0 end) as TotalHoursWorked,
            sum(case when t2.LaborType = 'no work' then Hours else 0 end) as TotalHoursOff
        from dbo.timesheet_mockup t2
        where   t2.EmployeeID = t1.EmployeeID
            and t2.DateWorked between dateadd(day, -DAYS, t1.DateWorked) and t1.DateWorked
    ) as lookback
    where t1.LaborType = 'work' and lookback.TotalHoursWorked > MAXHOURS and TotalHoursOff = 0
    order by EmployeeID, LastDateWorked;
    

    Or by only looking at consecutive dates in the first place (which also eliminates the concern about gaps in the data):

    with consecutive_dates as (
       select *,
           datediff(day, cast('20000101' as date), DateWorked)
               - row_number() over (partition by EmployeeID order by DateWorked) as grp
       from dbo.timesheet_mockup
       where LaborType = 'work'
    ), agg as (
        select *,
            count(*) over (
                partition by EmployeeID, grp order by DateWorked
                rows between DAYS preceding and current row) as NumConsecutiveDates,
            sum(Hours) over (
                partition by EmployeeID, grp order by DateWorked
                rows between DAYS preceding and current row) as TotalHoursWorked
        from consecutive_dates
    )
    select EmployeeID, DateWorked as LastDateWorked, NumConsecutiveDates, TotalHoursWorked
    from agg where TotalHoursWorked > MAXHOURS;