sqlpostgresqlwindow-functionsgaps-and-islands

Copy an ID to rows with adjacent timestamp ranges sharing a class


Here is a sample table.

create table test(ID,Start_date_time,End_date_time,class)
as values
 (131,  '5/26/2021 11:42',  '5/26/2021 12:42',  'AAA')
,(132,  '5/26/2021 12:42',  '5/26/2021 13:18',  'AAA')
,(113,  '5/26/2021 12:44',  '5/26/2021 13:19',  'AAA')
,(114,  '5/26/2021 13:19',  '5/26/2021 13:34',  'AAA')
,(115,  '5/26/2021 13:34',  '5/26/2021 13:44',  'AAA')
,(111,  '5/26/2021 16:09',  '5/26/2021 17:24',  'AAA')
,(112,  '5/26/2021 17:24',  '5/26/2021 18:09',  'AAA')
,(123,  '5/26/2021 8:08',   '5/26/2021 10:08',  'BBB')
,(124,  '5/26/2021 10:08',  '5/26/2021 11:08',  'BBB')
,(116,  '5/26/2021 11:06',  '5/26/2021 11:30',  'BBB')
,(117,  '5/26/2021 11:30',  '5/26/2021 12:18',  'BBB')
,(118,  '5/26/2021 12:18',  '5/26/2021 13:06',  'BBB')
,(128,  '5/26/2021 9:03',   '5/26/2021 9:48',   'CCC')
,(129,  '5/26/2021 9:48',   '5/26/2021 10:03',  'CCC')
,(130,  '5/26/2021 10:03',  '5/26/2021 10:13',  'CCC')
,(119,  '5/26/2021 10:22',  '5/26/2021 10:34',  'CCC')
,(120,  '5/26/2021 10:34',  '5/26/2021 10:58',  'CCC')
,(121,  '5/26/2021 10:58',  '5/26/2021 11:10',  'CCC')
,(122,  '5/26/2021 11:10',  '5/26/2021 11:22',  'CCC')
,(125,  '5/26/2021 14:47',  '5/26/2021 15:47',  'DDD')
,(126,  '5/26/2021 15:47',  '5/26/2021 16:35',  'DDD')
,(127,  '5/26/2021 16:35',  '5/26/2021 17:35',  'DDD')
;

I would like to have the table below. The idea is to propagate the first id in a class to all immediately adjacent time ranges of the same class.
Example: id 131 ends at 12:42, right when id 132 starts, so id 131 gets copied over as 132's updated_id.

id start_date_time end_date_time class updated_id
131 5/26/2021 11:42 5/26/2021 12:42 AAA 131
132 5/26/2021 12:42 5/26/2021 13:18 AAA 131
113 5/26/2021 12:44 5/26/2021 13:19 AAA 113
114 5/26/2021 13:19 5/26/2021 13:34 AAA 113
115 5/26/2021 13:34 5/26/2021 13:44 AAA 113
111 5/26/2021 16:09 5/26/2021 17:24 AAA 111
112 5/26/2021 17:24 5/26/2021 18:09 AAA 111
123 5/26/2021 8:08 5/26/2021 10:08 BBB 123
124 5/26/2021 10:08 5/26/2021 11:08 BBB 123
116 5/26/2021 11:06 5/26/2021 11:30 BBB 116
117 5/26/2021 11:30 5/26/2021 12:18 BBB 116
118 5/26/2021 12:18 5/26/2021 13:06 BBB 116
128 5/26/2021 9:03 5/26/2021 9:48 CCC 128
129 5/26/2021 9:48 5/26/2021 10:03 CCC 128
130 5/26/2021 10:03 5/26/2021 10:13 CCC 128
119 5/26/2021 10:22 5/26/2021 10:34 CCC 119
120 5/26/2021 10:34 5/26/2021 10:58 CCC 119
121 5/26/2021 10:58 5/26/2021 11:10 CCC 119
122 5/26/2021 11:10 5/26/2021 11:22 CCC 119
125 5/26/2021 14:47 5/26/2021 15:47 DDD 125
126 5/26/2021 15:47 5/26/2021 16:35 DDD 125
127 5/26/2021 16:35 5/26/2021 17:35 DDD 125

Solution

    1. Compare start/end timestamps using lead() or lag() over a window with rows of the same class. If the end of one row isn't the start timestamp of its consecutive row, there's a "gap".
    2. Take a stepping count() of only gaps - it'll show you continuous "islands". Walking a few rows without finding more gaps means you're still on the same island.
    3. Use first_value() to propagate the first id on each island to all its rows.

    demo at db<>fiddle

    with gaps as(
      select*,Start_date_time<>lag(End_date_time,1,Start_date_time)over w1 as "gap"
      from test
      window w1 as(partition by class order by Start_date_time))
    ,islands as(
      select*,count(*)filter(where "gap")over w2 as "island"
      from gaps
      window w2 as(partition by class order by Start_date_time))
    select ID
         , Start_date_time
         , End_date_time
         , class
         , first_value(id)over w3 as Updated_ID
    from islands
    window w3 as(partition by class,"island" order by Start_date_time);