I'm trying to solve a problem in the following (simplified) dataset:
Name | Date | Workday | Calenderday | Leave |
---|---|---|---|---|
PersonA | 2023-01-01 | 0 | 1 | NULL |
PersonA | 2023-01-07 | 0 | 1 | NULL |
PersonA | 2023-01-08 | 0 | 1 | NULL |
PersonA | 2023-01-13 | 1 | 1 | Sick |
PersonA | 2023-01-14 | 0 | 1 | NULL |
PersonA | 2023-01-15 | 0 | 1 | NULL |
PersonA | 2023-01-16 | 1 | 1 | Sick |
PersonA | 2023-01-20 | 1 | 1 | Holiday |
PersonA | 2023-01-21 | 0 | 1 | NULL |
PersonA | 2023-01-22 | 0 | 1 | NULL |
PersonA | 2023-01-23 | 1 | 1 | Holiday |
PersonB | 2023-01-01 | 0 | 1 | NULL |
PersonB | 2023-01-02 | 1 | 1 | Sick |
PersonB | 2023-01-03 | 1 | 1 | Sick |
Where the lines with NULL in [Leave] is weekend.
What I want is a result looking like this:
Name | Leave | PeriodStartDate | PeriodEndDate | Workdays | Weekdays |
---|---|---|---|---|---|
PersonA | Sick | 2023-01-13 | 2023-01-16 | 2 | 4 |
PersonA | Holiday | 2023-01-20 | 2023-01-23 | 2 | 4 |
PersonB | Sick | 2023-01-02 | 2023-01-03 | 2 | 2 |
where the difference between [Workdays] and [Weekdays] is that weekdays also counts the weekend.
What I have been trying is to first make a row (in two different ways)
ROW_NUMBER() OVER (PARTITION BY \[Name\] ORDER BY \[Date\]) as RowNo1
ROW_NUMBER() OVER (PARTITION BY \[Name\], \[Leave\] ORDER BY \[Date\]) as RowNo2
and after that to make a period base date:
DATEADD(DAY, 0 - \[RowNo1\], Date) as PeriodBaseDate1
,DATEADD(DAY, 0 - \[RowNo2\], \[Date\]) as PeriodBaseDate2
and after that do something like this:
MIN(\[Date\]) as PeriodStartDate
,MAX(\[Dato\]) as PeriodEndDate
,SUM(\[Calenderday\]) as Weekdays
,SUM(\[Workday\]) as Workdays
GROUP BY \[PeriodBaseDate (1 or 2?)\], \[Leave\], \[Name\]
But whatever I do I can't seem to get it to count the weekends in the periods.
It doesn't have to include my try with the RowNo, PeriodBaseDate etc.
As we don't have your actual full solutions, I've provided a full working one. I firstly use LAST_VALUE
to have all the rows have a value for their Leave
(provided there was a non-NULL
value previously).
Once I do that, you have a gaps and island problem, and can aggregate based on that.
I assume you are using SQL Server 2022, the latest version of SQL Server at the time of writing, as no version details are given and thus have access to the IGNORE NULLS
syntax.
SELECT *
INTO dbo.YourTable
FROM (VALUES('PersonA',CONVERT(date,'2023-01-01'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-07'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-08'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-13'),1,1,'Sick'),
('PersonA',CONVERT(date,'2023-01-14'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-15'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-16'),1,1,'Sick'),
('PersonA',CONVERT(date,'2023-01-20'),1,1,'Holiday'),
('PersonA',CONVERT(date,'2023-01-21'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-22'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-23'),1,1,'Holiday'),
('PersonB',CONVERT(date,'2023-01-01'),0,1,NULL),
('PersonB',CONVERT(date,'2023-01-02'),1,1,'Sick'),
('PersonB',CONVERT(date,'2023-01-03'),1,1,'Sick'))V(Name,Date,Workday,Calenderday,Leave);
GO
WITH Leaves AS(
SELECT Name,
[Date],
Workday,
Calenderday, --It's spelt Calendar, you should correct this typopgraphical error as objects with typoes lead to further problems.
--Leave,
LAST_VALUE(Leave) IGNORE NULLS OVER (PARTITION BY Name ORDER BY Date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Leave
FROM dbo.YourTable YT),
LeaveGroups AS(
SELECT Name,
[Date],
Workday,
CalenderDay,
Leave,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY Name, Leave ORDER BY Date) AS Grp
FROM Leaves)
SELECT Name,
Leave,
MIN([Date]) AS PeriodStartDate,
MAX([Date]) AS PeriodEndDate,
SUM(WorkDay) AS WorkDays, --Assumes Workday is not a bit, if it is, CAST or CONVERT it to a int
DATEDIFF(DAY,MIN([Date]), MAX([Date]))+1 AS Weekdays
--SUM(CASE WHEN (DATEPART(WEEKDAY,[Date]) + @@DATEFIRST + 5) % 7 BETWEEN 0 AND 4 THEN 1 END) AS Weekdays --This method is language agnostic
FROM LeaveGroups
WHERE Leave IS NOT NULL
GROUP BY Name,
Leave,
Grp
ORDER BY Name,
PeriodStartDate;
GO
DROP TABLE dbo.YourTable;