I am using the below query to create a materialized view schedule_state from the table schedule
CREATE MATERIALIZED VIEW schedule_state AS (
WITH schedule_latest_events AS (
SELECT
*,
row_number() over (
PARTITION BY key_id
ORDER BY
header_event_timestamp DESC,
ktimestamp DESC,
raw_load_timestamp DESC,
update_timestamp DESC
) AS row_number
FROM
schedule
)
SELECT
*
FROM
schedule_latest_events
WHERE
row_number = 1
);
As I am using multiple columns to order by, is there any way I can figure out which column was used for sorting for the column with rank 1?
Also is it possible to get the max of four order by columns and assign the result to a column in the materialized view? Thanks!
Table schedule
key_id, header_event_timestamp, ktimestamp, raw_load_timestamp, update_timestamp
k1, 2023-12-22 08:50:59.930000, 2023-12-22 08:50:59.930000, 2023-12-22 08:52:36.960000, 2023-12-22 08:50:58.100000
k1, 2023-12-22 08:50:37.530000, 2023-12-22 08:50:37.530000, 2023-12-22 08:52:36.960000, 2023-12-22 06:41:02.483000
k2, 2023-12-22 06:41:03.080000, 2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k2, 2023-12-22 06:41:03.080000, 2023-12-22 06:45:03.060000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k2, 2023-12-22 06:41:03.080000, 2023-12-20 15:09:36.370000, 2023-12-21 02:52:37.557000, 2023-12-20 14:04:37.323000
K3, 2023-12-20 14:39:00.909832, 2023-12-20 14:39:00.909000, 2023-12-21 02:52:37.557000, 2023-12-20 14:04:37.323000
K3, 2023-12-20 14:39:00.909832, 2023-12-20 14:39:00.909000, 2023-12-21 02:55:37.557000, 2023-12-20 14:04:37.323000
k4, 2023-12-22 06:41:03.080000, 2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k4, 2023-12-22 06:41:03.080000, 2023-12-22 06:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000
k4, 2023-12-22 06:41:03.080000, 2023-12-22 05:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000
k4, 2023-12-22 05:41:03.080000, 2023-12-22 05:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000
MATERIALIZED VIEW schedule_state
key_id, header_event_timestamp, ktimestamp, raw_load_timestamp, update_timestamp row_number
k1, 2023-12-22 08:50:59.930000, 2023-12-22 08:50:59.930000, 2023-12-22 08:52:36.960000, 2023-12-22 08:50:58.100000 1
k2, 2023-12-22 06:41:03.080000, 2023-12-22 06:45:03.060000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000 1
K3, 2023-12-20 14:39:00.909832, 2023-12-20 14:39:00.909000, 2023-12-21 02:55:37.557000, 2023-12-20 14:04:37.323000 1
k4, 2023-12-22 06:41:03.080000, 2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000 1
In the table
It is possible to determine the first column, among several defining an order, that discriminate between the first 2 records.
I have prepared the full example below, that I immediately comment with:
Schedule
CTE that you will need to remove yourself to get the real query. In that CTE:
key_id = ksingle
, to illustrate what happens when a record is alone.key_id = kequal
are the same, to illustrate what happens in case of a perfect tie.WINDOW
clause of the query in order to avoid repeats + have them next to each other for clarity.w2
, as I defined it with ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
, if you were to remove the final filter (WHERE row_number = 1
), you would end up with a comparison of all records vs their respective next one. This is the effect of having a range that starts at the current row (row #2 is always the next row) rather than being e.g. UNBOUNDED PRECEDING
(for which, row #2 is the actual second one for the whole partition group).nth_value
, with parameter 2
).array_position
is applied, which returns the first occurrence of false
.WITH Schedule(key_id, header_event_timestamp, ktimestamp, raw_load_timestamp, update_timestamp) AS ( VALUES
('ksingle', '2023-12-22 08:50:59.930000', '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('kequal', '2023-12-22 08:50:59.930000', '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('kequal', '2023-12-22 08:50:59.930000', '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('k1', '2023-12-22 08:50:59.930000', '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('k1', '2023-12-22 08:50:37.530000', '2023-12-22 08:50:37.530000', '2023-12-22 08:52:36.960000', '2023-12-22 06:41:02.483000'),
('k2', '2023-12-22 06:41:03.080000', '2023-12-22 06:41:03.080000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
('k2', '2023-12-22 06:41:03.080000', '2023-12-22 06:45:03.060000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
('k2', '2023-12-22 06:41:03.080000', '2023-12-20 15:09:36.370000', '2023-12-21 02:52:37.557000', '2023-12-20 14:04:37.323000'),
('k3', '2023-12-20 14:39:00.909832', '2023-12-20 14:39:00.909000', '2023-12-21 02:52:37.557000', '2023-12-20 14:04:37.323000'),
('k3', '2023-12-20 14:39:00.909832', '2023-12-20 14:39:00.909000', '2023-12-21 02:55:37.557000', '2023-12-20 14:04:37.323000'),
('k4', '2023-12-22 06:41:03.080000', '2023-12-22 06:41:03.080000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
('k4', '2023-12-22 06:41:03.080000', '2023-12-22 06:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000'),
('k4', '2023-12-22 06:41:03.080000', '2023-12-22 05:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000'),
('k4', '2023-12-22 05:41:03.080000', '2023-12-22 05:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000')
), schedule_latest_events AS (
SELECT *,
ROW_NUMBER() OVER w1 AS row_number,
nth_value(header_event_timestamp, 2) OVER w2 AS next_header_event_timestamp,
nth_value(ktimestamp , 2) OVER w2 AS next_ktimestamp,
nth_value(raw_load_timestamp , 2) OVER w2 AS next_raw_load_timestamp,
nth_value(update_timestamp , 2) OVER w2 AS next_update_timestamp
FROM Schedule
WINDOW
w1 AS (PARTITION BY key_id ORDER BY header_event_timestamp DESC, ktimestamp DESC, raw_load_timestamp DESC, update_timestamp DESC),
w2 AS (PARTITION BY key_id ORDER BY header_event_timestamp DESC, ktimestamp DESC, raw_load_timestamp DESC, update_timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
)
SELECT *,
header_event_timestamp = next_header_event_timestamp AS test_header_event_timestamp,
ktimestamp = next_ktimestamp AS test_ktimestamp,
raw_load_timestamp = next_raw_load_timestamp AS test_raw_load_timestamp,
update_timestamp = next_update_timestamp AS test_update_timestamp,
array_position(ARRAY[
header_event_timestamp = next_header_event_timestamp,
ktimestamp = next_ktimestamp,
raw_load_timestamp = next_raw_load_timestamp,
update_timestamp = next_update_timestamp
], 'false') AS first_different_column
FROM schedule_latest_events
WHERE row_number = 1
Since I made the query to illustrate several ways to represent the result you want, there is some redundancy in what the above query will return to you. I let you remove the columns you do not want to keep.
EDIT: @MatBailie has shared a version of the above query that uses the LEAD
window function instead of NTH_VALUE
and, importantly, only 1 window. That is only a minor code change compared to the above, with no difference on the result.
Until I find a way to generalize the above query (as I explained in the comment my initial attempt was to see if I could do something with a call to generate_series(2, ...)
) (there is no guarantee I ever will be able to find one), his version is superior.