I am working in Azure SQL MI. I am struggling with a certain SQL query. I have a "timesheet" table like the following:
EmployeeID DateWorked LaborType Hours
----------------------------------------------
1 2024-01-01 work 13
1 2024-01-02 work 12
1 2024-01-03 no work 24
1 2024-01-04 work 8
2 2024-01-01 no work 24
2 2024-01-02 work 11
2 2024-01-03 work 8
2 2024-01-04 work 13
Every employee will have 1 and only 1 row for every date. The "no work" hours will always be 24 hours.
We have a rule where an employee is not allowed to work more than 24 hours across 2 consecutive days. I need to find all instances of this rule violation. For example, employee 1 violates the rule on dates 2024-01-01 and 2024-01-02 because he worked 25 (13 + 12) hours. He does not violate the rule on dates 2024-01-02 and 2024-01-03 because the 2024-01-03 row is "no work". Similarly, employee 2 has no rule violations.
So, I expect output like the following:
EmployeeID ViolationDates TotalHoursWorked
-------------------------------------------------------
1 2024-01-01, 2024-01-02 25
How can I write a query to return this output?
If it helps, below is some T-SQL for the sample data.
CREATE TABLE dbo.timesheet_mockup
(
EmployeeID int
,DateWorked date
,LaborType varchar(7)
,Hours int
)
;
INSERT INTO
dbo.timesheet_mockup
VALUES
(1, '2024-01-01', 'work', 13)
,(1, '2024-01-02', 'work', 12)
,(1, '2024-01-03', 'no work', 24)
,(1, '2024-01-04', 'work', 8)
,(2, '2024-01-01', 'no work', 24)
,(2, '2024-01-02', 'work', 11)
,(2, '2024-01-03', 'work', 8)
,(2, '2024-01-04', 'work', 13)
;
select *, t1.Hours + t2.Hours as TotalHoursWorked
from dbo.timesheet_mockup t1 inner join dbo.timesheet_mockup t2
on t2.EmployeeID = t1.EmployeeID
and t2.DateWorked = dateadd(day, -1, t1.DateWorked)
and t1.LaborType = 'work' and t2.LaborType = 'work'
where t1.Hours + t2.Hours > 24;
You could generalize this across multiple rows very readily with a cross apply
:
select *
from dbo.timesheet_mockup t1 cross apply (
select sum(Hours) as TotalHoursWorked
from dbo.timesheet_mockup t2
where t2.EmployeeID = t1.EmployeeID
and t2.DateWorked between dateadd(day, -DAYS, t1.DateWorked) and t1.DateWorked
and t2.LaborType = 'work'
) as lookback
where t1.LaborType = 'work' and lookback.TotalHoursWorked > MAXHOURS;
sum() over (... range between)
would work except that SQL Server doesn't support this window with range
. You could still filter the non-work days in the summation but it does assume there are no gaps in the data:
with data as (
select *,
sum(case when LaborType = 'work' then Hours else 0 end) over (
partition by EmployeeID order by DateWorked
rows between 1 preceding and current row) as TotalHoursWorked
from dbo.timesheet_mockup
)
select * from data where TotalHoursWorked > 24;
Eventually this range
query will work on SQL Server too: https://dbfiddle.uk/W8boAC2t (Postgres example)
And then there's always the simple option to compare lead()
/lag()
across adjacent rows and all the relevant columns. (Left as an exercise.)
To guarantee that worked dates are consecutive you could gather the number of non-work hours and then ensure the total was zero:
select EmployeeID, DateWorked as LastDateWorked, TotalHoursWorked
from dbo.timesheet_mockup t1 cross apply (
select
sum(case when t2.LaborType = 'work' then Hours else 0 end) as TotalHoursWorked,
sum(case when t2.LaborType = 'no work' then Hours else 0 end) as TotalHoursOff
from dbo.timesheet_mockup t2
where t2.EmployeeID = t1.EmployeeID
and t2.DateWorked between dateadd(day, -DAYS, t1.DateWorked) and t1.DateWorked
) as lookback
where t1.LaborType = 'work' and lookback.TotalHoursWorked > MAXHOURS and TotalHoursOff = 0
order by EmployeeID, LastDateWorked;
Or by only looking at consecutive dates in the first place (which also eliminates the concern about gaps in the data):
with consecutive_dates as (
select *,
datediff(day, cast('20000101' as date), DateWorked)
- row_number() over (partition by EmployeeID order by DateWorked) as grp
from dbo.timesheet_mockup
where LaborType = 'work'
), agg as (
select *,
count(*) over (
partition by EmployeeID, grp order by DateWorked
rows between DAYS preceding and current row) as NumConsecutiveDates,
sum(Hours) over (
partition by EmployeeID, grp order by DateWorked
rows between DAYS preceding and current row) as TotalHoursWorked
from consecutive_dates
)
select EmployeeID, DateWorked as LastDateWorked, NumConsecutiveDates, TotalHoursWorked
from agg where TotalHoursWorked > MAXHOURS;