sqlpostgresqlgaps-and-islandspostgresql-9.6

Query to find unique time periods from list of time ranges


I have a table which looks like below:

CREATE TABLE time_records (
  id uuid NOT NULL,
  employee_id uuid NOT NULL,
  starttime timestampt NOT NULL,
  endtime timestampt NOT NULL
)

There will be overlap in times between the records for the same employee_id:

id employee_id starttime endtime
1 1 '2023-09-01 07:00:00' '2023-09-01 09:15:00'
2 1 '2023-09-01 07:00:00' '2023-09-01 15:00:00'
3 1 '2023-09-01 07:00:00' '2023-09-01 15:00:00'
4 1 '2023-09-01 14:00:00' '2023-09-01 15:00:00'
5 1 '2023-09-01 23:45:00' '2023-09-01 23:59:00'
6 1 '2023-09-01 23:45:00' '2023-09-01 23:59:00'

What I'm trying to do is get the time ranges within all of these times:

employee_id starttime endtime ids
1 '2023-09-01 07:00:00' '2023-09-01 15:00:00' [1,2,3,4]
1 '2023-09-01 23:45:00' '2023-09-01 23:29:00' [5,6]

I can get this to work if there's only one set of overlapping time within a day using max/min for the start and end times, but I can't seem to make it work when there are multiple sets of overlapping time in a day:

select timea.employee_id,
       min(timea.starttime) starttime,
       max(timea.endtime)   endtime,
       array_agg(timea.id) ids
from time_records timea
         inner join time_records timea2 on timea.employee_id = timea2.employee_id and
                                           tsrange(timea2.starttime, timea2.endtime, '[]') &&
                                           tsrange(timea.starttime, timea.endtime, '[]')
    and timea.id != timea2.id
group by timea.employee_id;

With results:

employee_id starttime endtime ids
1 '2023-09-01 07:00:00' '2023-09-01 23:59:00' [1,2,3,4,5,6]

Solution

  • make it work when there are multiple sets of overlapping time in a day

    Plain aggregation with min() and max() cannot solve this. Which rows eventually form a group only becomes evident after merging ranges.

    The aggregate function range_agg() makes the task a whole lot simpler. It was added with Postgres 14. Just computing merged ranges is very simple now:

    SELECT unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged_range
    FROM   time_records;
    

    To also get an array of involved IDs, we need to do more. One way is to join back to the underlying table and then aggregate once more (groups are identified by the merged ranges now):

    SELECT employee_id, lower(merged) AS starttime, upper(merged) AS endtime
         , array_agg(t.id) AS ids
    FROM  (
       SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged
       FROM   time_records
       GROUP  BY employee_id
       ) r
    JOIN   time_records t USING (employee_id)
    WHERE  r.merged @> t.starttime
    GROUP  BY r.employee_id, r.merged
    ORDER  BY r.employee_id, r.merged;
    

    Another way with a LATERAL subquery:

    SELECT r.employee_id, lower(r.merged) AS starttime, upper(r.merged) AS endtime, i.ids
    FROM  (
       SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged
       FROM   time_records
       GROUP  BY employee_id
       ) r
    CROSS  JOIN LATERAL (
       SELECT ARRAY (
          SELECT t.id
          FROM   time_records t
          WHERE  t.employee_id = r.employee_id
          AND    t.starttime <@ r.merged
          ORDER  BY t.id
          )
       ) i (ids)
    ORDER  BY r.employee_id, r.merged;
    

    fiddle

    Related:

    Not sure if either query is also faster than my custom function below, because that only iterates over the whole table once.

    Postgres 9.6

    While stuck on your outdated version, create a custom set-returning function (once):

    CREATE OR REPLACE FUNCTION public.f_merge_ranges()
      RETURNS TABLE (
        employee_id int
      , starttime timestamp
      , endtime timestamp
      , ids int[]
      )
      LANGUAGE plpgsql AS
    $func$
    DECLARE
       r record;  -- current row
    BEGIN
       FOR r IN 
          SELECT t.id, t.employee_id, t.starttime, t.endtime
          FROM   time_records t
          ORDER  BY t.employee_id, t.starttime, t.endtime DESC, t.id  -- better take longer range first
       LOOP
          IF r.employee_id = employee_id THEN  -- works for null in first iteration
             IF r.starttime > endtime THEN
                RETURN NEXT;
                starttime   := r.starttime;
                endtime     := r.endtime;
                ids         := ARRAY[r.id];
             ELSE
                ids         := ids || r.id;
                IF r.endtime > endtime THEN
                   endtime  := r.endtime;
                END IF;
             END IF;
          ELSE
             IF employee_id IS NOT NULL THEN  -- catch first iteration
                RETURN NEXT;
             END IF;
             employee_id := r.employee_id;
             starttime   := r.starttime;
             endtime     := r.endtime;
             ids         := ARRAY[r.id];
          END IF;
       END LOOP;
    
       -- return last row (if any)
       IF FOUND THEN
          RETURN NEXT;
       END IF;
    END
    $func$;
    

    Call:

    SELECT * FROM public.f_merge_ranges();
    

    fiddle

    Unlike the queries above, the array in ids is unsorted. You need to do more if you need that.

    Related: