postgresqllagpostgresql-9.4row-numberlead

postgresql - find Discontinuous id and get read_time


I have a table like this, and there are three cases,

## case a
| rec_no |      read_time      | id
+--------+---------------------+----
|  45139 | 2023-02-07 17:00:00 | a
|  45140 | 2023-02-07 17:15:00 | a
|  45141 | 2023-02-07 17:30:00 | a
|  45142 | 2023-02-07 18:15:00 | a
|  45143 | 2023-02-07 18:30:00 | a
|  45144 | 2023-02-07 18:45:00 | a
## case b
| rec_no |      read_time      | id
+--------+---------------------+----
|  21735 | 2023-02-01 19:15:00 | b
|  21736 | 2023-02-01 19:30:00 | b
|  21742 | 2023-02-01 21:00:00 | b
|  21743 | 2023-02-01 21:15:00 | b
|  21744 | 2023-02-01 21:30:00 | b
|  21745 | 2023-02-01 21:45:00 | b
## case c
| rec_no |      read_time      | id
+--------+---------------------+----
|  12345 | 2023-02-02 12:15:00 | c
|  12346 | 2023-02-02 12:30:00 | c
|  12347 | 2023-02-02 12:45:00 | c
|  12348 | 2023-02-02 13:15:00 | c
|  12352 | 2023-02-02 14:00:00 | c
|  12353 | 2023-02-02 14:15:00 | c

I'd like to find out the missing readtime field when the rec is not continuous.

I'd like something like this,

## case a
## nothing because rec_no is continous
|      read_time      | id
+---------------------+----
## case b
## get six rows
|      read_time      | id
+--------+-----------------
| 2023-02-01 19:45:00 | b
| 2023-02-01 20:00:00 | b
| 2023-02-01 20:15:00 | b
| 2023-02-01 20:30:00 | b
| 2023-02-01 20:45:00 | b
| 2023-02-01 21:00:00 | b
## case c
## get two rows (13:00:00 is missing but rec_no is continous)
|      read_time      | id
+--------+-----------------
| 2023-02-02 13:30:00 | c
| 2023-02-02 13:45:00 | c

Is there a way to do this ? The output format is not too important as long as I can get the result correctly.


Solution

  • step-by-step demo: db<>fiddle

    SELECT
        rec_no,
        id,
        gs
    FROM (
        SELECT
            *,
            lead(rec_no) OVER (PARTITION BY id ORDER BY rec_no) - rec_no > 1 AS is_gap,  -- 1
            lead(read_time) OVER (PARTITION BY id ORDER BY rec_no) as next_read_time
        FROM mytable
    )s, generate_series(                                                                 -- 3
          read_time + interval '15 minutes',                                             -- 4
          next_read_time - interval '15 minutes', 
          interval '15 minutes'
      ) as gs
    WHERE is_gap                                                                         -- 2 
    
    1. Use lead() window function to move the next rec_no value and the next read_time value to the current row. With this you can check if the difference between the current and next rec_no values are greater than 1.
    2. Filter all records with greater differences
    3. Generate a time series with 15 minutes interval
    4. Because the series includes start and end, you need a start at the next 15 minutes points (+ interval) and end one "slot" before the next recorded value (- interval).