trying to get the following SQL Presto code to work so that I can perform business day calculations. It works if I write "d -> day_of_week(d) not in (6,7)" but I also need to filter out holidays from a calendar hence the "and contains(array_agg(date), d)" clause which does not work. I have also tried contains(date, d) but this failed as well. Please advise.
WITH dataset(start_date, end_date) AS (
values (date '2021-10-01', date '2021-10-05'),
(date '2021-10-01', date '2021-10-03'),
(date '2021-10-02', date '2021-10-10'),
(date '2021-10-02', date '2021-10-08'),
(date '2021-10-02', date '2021-10-05')
),
holidays (row, date) AS (
values (1,date '2021-10-08'))
select start_date,
end_date,
cardinality(filter(
sequence(start_date, end_date, interval '1' day),
d -> day_of_week(d) not in (6,7) and not contains(array_agg(date), d)
)) business_days
from dataset, holidays
You can flatten the data, filter and then aggregate it back:
-- sample data
WITH dataset(start_date, end_date) AS (
values (date '2021-10-01', date '2021-10-05'),
(date '2021-10-01', date '2021-10-03'),
(date '2021-10-02', date '2021-10-10'),
(date '2021-10-02', date '2021-10-08'),
(date '2021-10-02', date '2021-10-05')
),
holidays (row, date) AS (
values (1,date '2021-10-08')),
-- query
with_ids as (
select start_date,
end_date,
row_number() over () rn
from dataset
)
select arbitrary(start_date) start_date,
arbitrary(end_date) end_date,
array_agg(day) business_days
from with_ids,
unnest(sequence(start_date, end_date, interval '1' day)) as t(day)
where day not in (select date from holidays)
group by rn;
Output:
start_date | end_date | business_days |
---|---|---|
2021-10-01 | 2021-10-05 | [2021-10-01, 2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05] |
2021-10-02 | 2021-10-10 | [2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05, 2021-10-06, 2021-10-07, 2021-10-09, 2021-10-10] |
2021-10-02 | 2021-10-08 | [2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05, 2021-10-06, 2021-10-07] |
2021-10-02 | 2021-10-05 | [2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05] |
2021-10-01 | 2021-10-03 | [2021-10-01, 2021-10-02, 2021-10-03] |
The only problem here is that this will result in deletion of ranges without business days (you can bring them back by left joining the original data).