sqlpostgresqltemporal

Find SCD-2 timerange overlaps when a sensor belongs to multiple labels simultaneously in SQL (Postgres)


I have a PostgreSQL table that represents the time-bound associations of different sensors with different labels, using a SCD-2 approach. The table structure is as follows:

CREATE TABLE SensorLabel (
    sensor_id INT,
    label_id INT,
    start_time TIMESTAMPTZ,
    end_time TIMESTAMPTZ
);

Each row in this table represents that a sensor (sensor_id) is associated with a label (label_id) during a specific time period [start_time to end_time) (so, resp. inclusive, exclusive).

Now I have a problem where I need to find all time range overlaps for a set of labels. That is, I have a set of labels {label1, label2, ..., labelN} and I want to find all time ranges when the sensor was associated with each of these labels at the same time.

Please note that the time range for each label association could be different, so the intersection of these ranges may break up into smaller time-ranges. I want to return only those time-ranges during which the sensor is associated with all the given labels.

I can solve for say a static 2 labels or 3 labels but have problems wrapping this in a generic SQL that solves for a variable N labels

EDIT: some sample input and output

sensor|label|from|to
1|1|2021-01-01|2021-10-01
1|2|2020-12-01|2021-05-01
1|2|2021-07-01|2021-09-01
1|3|2021-03-01|2021-06-01
1|3|2021-08-01|2021-12-01

Ouput: i.e.: time-ranges where (time-ranges for) label 1,2,3, overlap:

sensor|from|to
1|2021-03-01|2021-05-01
1|2021-08-01|2021-09-01

Solution

  • The only way I could do it is with a function:

    CREATE TABLE sensorlabel (
        sensor_id INT,
        label_id INT,
        start_time TIMESTAMPTZ,
        end_time TIMESTAMPTZ
    );
    
    insert into sensorlabel values
    (1,1,'2021-01-01','2021-10-01'),
    (1,2,'2020-12-01','2021-05-01'),
    (1,2,'2021-07-01','2021-09-01'),
    (1,3,'2021-03-01','2021-06-01'),
    (1,3,'2021-08-01','2021-12-01');
    
    CREATE OR REPLACE FUNCTION public.tsrange_fnc(l_ids integer[], s_id integer)
     RETURNS SETOF tstzrange
     LANGUAGE plpgsql
    AS $function$
    DECLARE
    
        _tsrange tstzrange;
        _prevrange tstzrange := NULL;
        _testrange tstzrange;
    
    BEGIN
    for _tsrange in select
            tstzrange(start_time, end_time)
        from
            sensorlabel
        where
            label_id = ANY(l_ids)
        and
            sensor_id = s_id
        order by
            start_time  LOOP
        IF _prevrange IS NULL THEN
            _prevrange = _tsrange;
        ELSE
            _testrange = _tsrange * _prevrange;
            IF _testrange = 'empty'::tstzrange THEN
                RETURN NEXT _prevrange;
                _prevrange = _tsrange;
            ELSE
               _prevrange =  _tsrange * _prevrange;
            END IF;
        END IF;
    
    
    END LOOP;
                RETURN NEXT _prevrange;
    END;
    $function$
    
    
    
    
    
    select tsrange_fnc(ARRAY[1, 2, 3], 1);
                          tsrange_fnc                      
    -------------------------------------------------------
     ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
     ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
    
     select tsrange_fnc(ARRAY[1, 2], 1);
                          tsrange_fnc                      
    -------------------------------------------------------
     ["01/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
     ["07/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")
    
    select tsrange_fnc(ARRAY[2, 3], 1);
                          tsrange_fnc                      
    -------------------------------------------------------
     ["03/01/2021 00:00:00 PST","05/01/2021 00:00:00 PDT")
     ["08/01/2021 00:00:00 PDT","09/01/2021 00:00:00 PDT")