sqlrecursive-querypostgresql-13

Recursive query to replace null values


With the following table, I need to make a query to replace the null values of “time” (number of minutes, integer) by a value consistent with the preceding and following values, with partitioning by “esz” and sorting by ascending “rank”. In the example, the data is already sorted. Ideally:

id esz code rank time
1 1 "SOM" 1 5
2 1 "QUI" 2 NULL
3 1 "VER" 3 10
4 1 "NSC" 4 15
5 1 "3SM" 99 NULL
6 2 "QUI" 1 7
7 2 "VER" 2 NULL
8 2 "SOM" 3 NULL
9 2 "NSC" 4 12
10 2 "3SM" 99 NULL
11 3 "NSC" 1 NULL
12 3 "VER" 2 NULL
13 3 "QUI" 3 11
14 3 "SOM" 4 12
15 3 "3SM" 99 NULL
16 4 "SOM" 1 2
17 4 "QUI" 2 NULL
18 4 "NSC" 3 3
19 4 "VER" 4 NULL
20 4 "3SM" 99 NULL
21 5 "NSC" 1 NULL
22 5 "SOM" 2 4
23 5 "VER" 3 NULL
24 5 "QUI" 4 7
25 5 "3SM" 99 NULL

Sample output:

id esz code rank time
1 1 "SOM" 1 5
2 1 "QUI" 2 6
3 1 "VER" 3 10
4 1 "NSC" 4 15
5 1 "3SM" 99 16
6 2 "QUI" 1 7
7 2 "VER" 2 8
8 2 "SOM" 3 9
9 2 "NSC" 4 12
10 2 "3SM" 99 13
11 3 "NSC" 1 9
12 3 "VER" 2 10
13 3 "QUI" 3 11
14 3 "SOM" 4 12
15 3 "3SM" 99 13
16 4 "SOM" 1 2
17 4 "QUI" 2 2
18 4 "NSC" 3 3
19 4 "VER" 4 4
20 4 "3SM" 99 5
21 5 "NSC" 1 3
22 5 "SOM" 2 4
23 5 "VER" 3 5
24 5 "QUI" 4 7
25 5 "3SM" 99 8

I can handle most of the cases in the table with LAG() and LEAD() and CASE. But I still have problems with some of them: id=11 and id=12 for example. Either I have to go backwards from the value of id=13, or I have to know when filling id=11 that I have to leave a value available to fill id=12.

My last query:

    SELECT
      esz,
      code,
      rank,
      CASE
        WHEN time IS NOT NULL THEN time
        ELSE
          CASE
            WHEN rank = 1 THEN 1
            WHEN LAG(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL AND LEAD(time) OVER(PARTITION BY esz ORDER BY rank) IS NULL THEN LAG(time) OVER(PARTITION BY esz ORDER BY rank) + 1
            WHEN LEAD(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL AND LAG(time) OVER(PARTITION BY esz ORDER BY rank) IS NULL THEN LEAD(time) OVER(PARTITION BY esz ORDER BY rank) - 1
            WHEN LAG(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL AND LEAD(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL THEN
              CASE
                WHEN LEAD(time) OVER(PARTITION BY esz ORDER BY rank) - LAG(time) OVER(PARTITION BY esz ORDER BY rank) <= 1 THEN LAG(time) OVER(PARTITION BY esz ORDER BY rank)
                ELSE LAG(time) OVER(PARTITION BY esz ORDER BY rank) +1
              END
            WHEN rank > LEAD(rank) OVER(PARTITION BY esz ORDER BY rank) OR LEAD(rank) OVER(PARTITION BY esz ORDER BY rank) IS NULL THEN 999
          END
      END AS time
      FROM mytable
      ORDER BY esz, rank

Here's the result:

id esz code rank time
1 1 "SOM" 1 5
2 1 "QUI" 2 6
3 1 "VER" 3 10
4 1 "NSC" 4 15
5 1 "3SM" 99 16
6 2 "QUI" 1 7
7 2 "VER" 2 8
8 2 "SOM" 3 <11>
9 2 "NSC" 4 12
10 2 "3SM" 99 13
11 3 "NSC" 1 <1>
12 3 "VER" 2 10
13 3 "QUI" 3 11
14 3 "SOM" 4 12
15 3 "3SM" 99 13
16 4 "SOM" 1 2
17 4 "QUI" 2 2
18 4 "NSC" 3 3
19 4 "VER" 4 4
20 4 "3SM" 99 <999>
21 5 "NSC" 1 <1>
22 5 "SOM" 2 4
23 5 "VER" 3 5
24 5 "QUI" 4 7
25 5 "3SM" 99 8

I've highlighted the values that differ from the expected result. It's not perfect, but there are no inconsistencies. However, there would be a problem if more than 3 consecutive overrides were inserted.


Solution

  • I found a way to resolved this issue correctly.
    As there are two NULL scenario in your inputs. So, design query according to that ...

    select id, esz,  code, ranks,
      -- lag(times) over(partition by esz order by ranks) as prev_time,
      -- lead(times) over(partition by esz order by ranks) as next_time,
      times,
      CASE
          -- Previous || Current*  || Next :: Scenario
          --    ANY   || NOT NULL* || ANY  
          WHEN(times is not NULL) 
               THEN times
          -- NOT NULL ||   NULL*   || NOT NULL
          WHEN(times is NULL AND 
               (lag(times) over(partition by esz order by ranks)) is not NULL AND 
               (lead(times) over(partition by esz order by ranks)) is not NULL) 
               THEN CASE 
                      WHEN((lag(times) over(partition by esz order by ranks) + 1) <
                            (lead(times) over(partition by esz order by ranks)) - 1)
                         THEN ((lag(times) over(partition by esz order by ranks)) + 1)
                      ELSE
                          ((lead(times) over(partition by esz order by ranks)) - 1)
                    END
          --   NOT NULL ||  NULL  ||   NULL*   ||  ANY   (esz = 4 and 2)
          WHEN(times is NULL AND
               (lag(times) over(partition by esz order by ranks)) is NULL AND
               (lag(times, 2) over(partition by esz order by ranks)) is not NULL) 
               THEN CASE
                   WHEN((lead(times) over(partition by esz order by ranks)) is NULL)
                      THEN (lag(times, 2) over(partition by esz order by ranks) + 2)
                   ELSE CASE
                      WHEN((lag(times, 2) over(partition by esz order by ranks) + 2) <
                           (lead(times) over(partition by esz order by ranks) - 1))
                         THEN (lag(times, 2) over(partition by esz order by ranks) + 2)
                      ELSE (lead(times) over(partition by esz order by ranks) - 1)
                   END
                END
           --   ANY  ||   NULL*   ||  NULL || NOT NULL  (esz = 3)
          WHEN(times is NULL AND
               (lead(times) over(partition by esz order by ranks)) is NULL AND
               (lead(times, 2) over(partition by esz order by ranks)) is not NULL) 
               THEN CASE
                  WHEN ((lag(times) over(partition by esz order by ranks)) is NULL)
                      THEN (lead(times, 2) over(partition by esz order by ranks) - 2)
                  ELSE CASE
                      WHEN((lag(times) over(partition by esz order by ranks) + 1) <
                            (lead(times, 2) over(partition by esz order by ranks) - 2))
                          THEN (lag(times) over(partition by esz order by ranks) + 1)
                      ELSE (lead(times, 2) over(partition by esz order by ranks) - 2)
                  END
                END
          --    NOT NULL ||   NULL*   || ANY
          WHEN(times is NULL AND 
               (lag(times) over(partition by esz order by ranks)) is not NULL) 
               THEN ((lag(times) over(partition by esz order by ranks)) + 1)
          --       NULL  ||   NULL*   ||  NOT NULL
          WHEN(times is NULL AND 
               (lag(times) over(partition by esz order by ranks)) is NULL AND 
               (lead(times) over(partition by esz order by ranks)) is not NULL) 
               THEN ((lead(times) over(partition by esz order by ranks)) - 1)
          ELSE -999
      END as time_output
    from T1
    

    Run the Query: db<>fiddle

    Output with validation:
    enter image description here

    This solution work for two null scenario top or bottom or both, as follows:

    value | null | *(Current Position = NULL) | null | value
    value | null | *(Current Position = NULL) | anything
    anything | *(Current Position = NULL) | null | value
    

    So, go to db<>fiddle page and there I explain all conditions one by one.
    Please add the Case-03 (null-CurrentPos-null) scenario if required.

    I think you understand my approach. So, design your query according to that.