sqlpostgresqlmaterialized-views

Fetch the column used for sorting in multiple orderby parameters postgres


I am using the below query to create a materialized view schedule_state from the table schedule

CREATE MATERIALIZED VIEW schedule_state AS (
    WITH schedule_latest_events AS (
        SELECT
            *,
            row_number() over (
                PARTITION BY key_id
                ORDER BY
                    header_event_timestamp DESC,
                    ktimestamp DESC,
                    raw_load_timestamp DESC,
                    update_timestamp DESC
            ) AS row_number
        FROM
            schedule
    )
    SELECT
        *
    FROM
        schedule_latest_events
    WHERE
        row_number = 1
);

As I am using multiple columns to order by, is there any way I can figure out which column was used for sorting for the column with rank 1?

Also is it possible to get the max of four order by columns and assign the result to a column in the materialized view? Thanks!

Table schedule

key_id,   header_event_timestamp,              ktimestamp,             raw_load_timestamp,         update_timestamp
k1,    2023-12-22 08:50:59.930000,  2023-12-22 08:50:59.930000, 2023-12-22 08:52:36.960000, 2023-12-22 08:50:58.100000
k1,    2023-12-22 08:50:37.530000,  2023-12-22 08:50:37.530000, 2023-12-22 08:52:36.960000, 2023-12-22 06:41:02.483000
k2,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k2,    2023-12-22 06:41:03.080000,  2023-12-22 06:45:03.060000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k2,    2023-12-22 06:41:03.080000,  2023-12-20 15:09:36.370000, 2023-12-21 02:52:37.557000, 2023-12-20 14:04:37.323000
K3,    2023-12-20 14:39:00.909832,  2023-12-20 14:39:00.909000, 2023-12-21 02:52:37.557000, 2023-12-20 14:04:37.323000
K3,    2023-12-20 14:39:00.909832,  2023-12-20 14:39:00.909000, 2023-12-21 02:55:37.557000, 2023-12-20 14:04:37.323000
k4,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k4,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000
k4,    2023-12-22 06:41:03.080000,  2023-12-22 05:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000
k4,    2023-12-22 05:41:03.080000,  2023-12-22 05:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000

MATERIALIZED VIEW schedule_state

key_id,   header_event_timestamp,              ktimestamp,             raw_load_timestamp,         update_timestamp         row_number
k1,    2023-12-22 08:50:59.930000,  2023-12-22 08:50:59.930000, 2023-12-22 08:52:36.960000, 2023-12-22 08:50:58.100000          1
k2,    2023-12-22 06:41:03.080000,  2023-12-22 06:45:03.060000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000          1
K3,    2023-12-20 14:39:00.909832,  2023-12-20 14:39:00.909000, 2023-12-21 02:55:37.557000, 2023-12-20 14:04:37.323000          1
k4,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000          1

In the table


Solution

  • It is possible to determine the first column, among several defining an order, that discriminate between the first 2 records.

    I have prepared the full example below, that I immediately comment with:

    WITH Schedule(key_id, header_event_timestamp, ktimestamp, raw_load_timestamp, update_timestamp) AS ( VALUES
    ('ksingle', '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
    ('kequal',  '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
    ('kequal',  '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
    ('k1',      '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
    ('k1',      '2023-12-22 08:50:37.530000',  '2023-12-22 08:50:37.530000', '2023-12-22 08:52:36.960000', '2023-12-22 06:41:02.483000'),
    ('k2',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:41:03.080000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
    ('k2',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:45:03.060000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
    ('k2',      '2023-12-22 06:41:03.080000',  '2023-12-20 15:09:36.370000', '2023-12-21 02:52:37.557000', '2023-12-20 14:04:37.323000'),
    ('k3',      '2023-12-20 14:39:00.909832',  '2023-12-20 14:39:00.909000', '2023-12-21 02:52:37.557000', '2023-12-20 14:04:37.323000'),
    ('k3',      '2023-12-20 14:39:00.909832',  '2023-12-20 14:39:00.909000', '2023-12-21 02:55:37.557000', '2023-12-20 14:04:37.323000'),
    ('k4',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:41:03.080000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
    ('k4',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000'),
    ('k4',      '2023-12-22 06:41:03.080000',  '2023-12-22 05:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000'),
    ('k4',      '2023-12-22 05:41:03.080000',  '2023-12-22 05:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000')
    ), schedule_latest_events AS (
        SELECT *,
               ROW_NUMBER()                         OVER w1 AS row_number,
               nth_value(header_event_timestamp, 2) OVER w2 AS next_header_event_timestamp,
               nth_value(ktimestamp            , 2) OVER w2 AS next_ktimestamp,
               nth_value(raw_load_timestamp    , 2) OVER w2 AS next_raw_load_timestamp,
               nth_value(update_timestamp      , 2) OVER w2 AS next_update_timestamp
        FROM Schedule
        WINDOW
        w1 AS (PARTITION BY key_id ORDER BY header_event_timestamp DESC, ktimestamp DESC, raw_load_timestamp DESC, update_timestamp DESC),
        w2 AS (PARTITION BY key_id ORDER BY header_event_timestamp DESC, ktimestamp DESC, raw_load_timestamp DESC, update_timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
    )
    SELECT *,
           header_event_timestamp = next_header_event_timestamp AS test_header_event_timestamp,
           ktimestamp             = next_ktimestamp             AS test_ktimestamp,
           raw_load_timestamp     = next_raw_load_timestamp     AS test_raw_load_timestamp,
           update_timestamp       = next_update_timestamp       AS test_update_timestamp,
           array_position(ARRAY[
               header_event_timestamp = next_header_event_timestamp,
               ktimestamp             = next_ktimestamp,
               raw_load_timestamp     = next_raw_load_timestamp,
               update_timestamp       = next_update_timestamp
           ], 'false') AS first_different_column
    FROM schedule_latest_events
    WHERE row_number = 1
    

    Since I made the query to illustrate several ways to represent the result you want, there is some redundancy in what the above query will return to you. I let you remove the columns you do not want to keep.


    EDIT: @MatBailie has shared a version of the above query that uses the LEAD window function instead of NTH_VALUE and, importantly, only 1 window. That is only a minor code change compared to the above, with no difference on the result.

    Until I find a way to generalize the above query (as I explained in the comment my initial attempt was to see if I could do something with a call to generate_series(2, ...)) (there is no guarantee I ever will be able to find one), his version is superior.