I have a table which looks like below:
CREATE TABLE time_records (
id uuid NOT NULL,
employee_id uuid NOT NULL,
starttime timestampt NOT NULL,
endtime timestampt NOT NULL
)
There will be overlap in times between the records for the same employee_id:
id | employee_id | starttime | endtime |
---|---|---|---|
1 | 1 | '2023-09-01 07:00:00' | '2023-09-01 09:15:00' |
2 | 1 | '2023-09-01 07:00:00' | '2023-09-01 15:00:00' |
3 | 1 | '2023-09-01 07:00:00' | '2023-09-01 15:00:00' |
4 | 1 | '2023-09-01 14:00:00' | '2023-09-01 15:00:00' |
5 | 1 | '2023-09-01 23:45:00' | '2023-09-01 23:59:00' |
6 | 1 | '2023-09-01 23:45:00' | '2023-09-01 23:59:00' |
What I'm trying to do is get the time ranges within all of these times:
employee_id | starttime | endtime | ids |
---|---|---|---|
1 | '2023-09-01 07:00:00' | '2023-09-01 15:00:00' | [1,2,3,4] |
1 | '2023-09-01 23:45:00' | '2023-09-01 23:29:00' | [5,6] |
I can get this to work if there's only one set of overlapping time within a day using max/min for the start and end times, but I can't seem to make it work when there are multiple sets of overlapping time in a day:
select timea.employee_id,
min(timea.starttime) starttime,
max(timea.endtime) endtime,
array_agg(timea.id) ids
from time_records timea
inner join time_records timea2 on timea.employee_id = timea2.employee_id and
tsrange(timea2.starttime, timea2.endtime, '[]') &&
tsrange(timea.starttime, timea.endtime, '[]')
and timea.id != timea2.id
group by timea.employee_id;
With results:
employee_id | starttime | endtime | ids |
---|---|---|---|
1 | '2023-09-01 07:00:00' | '2023-09-01 23:59:00' | [1,2,3,4,5,6] |
make it work when there are multiple sets of overlapping time in a day
Plain aggregation with min()
and max()
cannot solve this. Which rows eventually form a group only becomes evident after merging ranges.
The aggregate function range_agg()
makes the task a whole lot simpler. It was added with Postgres 14. Just computing merged ranges is very simple now:
SELECT unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged_range
FROM time_records;
To also get an array of involved IDs, we need to do more. One way is to join back to the underlying table and then aggregate once more (groups are identified by the merged ranges now):
SELECT employee_id, lower(merged) AS starttime, upper(merged) AS endtime
, array_agg(t.id) AS ids
FROM (
SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged
FROM time_records
GROUP BY employee_id
) r
JOIN time_records t USING (employee_id)
WHERE r.merged @> t.starttime
GROUP BY r.employee_id, r.merged
ORDER BY r.employee_id, r.merged;
Another way with a LATERAL
subquery:
SELECT r.employee_id, lower(r.merged) AS starttime, upper(r.merged) AS endtime, i.ids
FROM (
SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged
FROM time_records
GROUP BY employee_id
) r
CROSS JOIN LATERAL (
SELECT ARRAY (
SELECT t.id
FROM time_records t
WHERE t.employee_id = r.employee_id
AND t.starttime <@ r.merged
ORDER BY t.id
)
) i (ids)
ORDER BY r.employee_id, r.merged;
Related:
Not sure if either query is also faster than my custom function below, because that only iterates over the whole table once.
While stuck on your outdated version, create a custom set-returning function (once):
CREATE OR REPLACE FUNCTION public.f_merge_ranges()
RETURNS TABLE (
employee_id int
, starttime timestamp
, endtime timestamp
, ids int[]
)
LANGUAGE plpgsql AS
$func$
DECLARE
r record; -- current row
BEGIN
FOR r IN
SELECT t.id, t.employee_id, t.starttime, t.endtime
FROM time_records t
ORDER BY t.employee_id, t.starttime, t.endtime DESC, t.id -- better take longer range first
LOOP
IF r.employee_id = employee_id THEN -- works for null in first iteration
IF r.starttime > endtime THEN
RETURN NEXT;
starttime := r.starttime;
endtime := r.endtime;
ids := ARRAY[r.id];
ELSE
ids := ids || r.id;
IF r.endtime > endtime THEN
endtime := r.endtime;
END IF;
END IF;
ELSE
IF employee_id IS NOT NULL THEN -- catch first iteration
RETURN NEXT;
END IF;
employee_id := r.employee_id;
starttime := r.starttime;
endtime := r.endtime;
ids := ARRAY[r.id];
END IF;
END LOOP;
-- return last row (if any)
IF FOUND THEN
RETURN NEXT;
END IF;
END
$func$;
Call:
SELECT * FROM public.f_merge_ranges();
Unlike the queries above, the array in ids
is unsorted. You need to do more if you need that.
Related: