I have a table:
CREATE TABLE t_table (
c_id int4 NOT NULL,
c_date_called int4 NOT NULL,
CONSTRAINT t_table_un UNIQUE (c_id, c_date_called)
);
that stores distinct snapshots of data, with data as such:
INSERT INTO t_table (c_id, c_date_called)
VALUES
(1,9),
(2,9),
(3,9),
(4,9),
(5,9),
(1,12),
(2,12),
(3,12),
(5,12),
(1,17),
(3,17)
;
Against this table I can run an anti-join, comparing the current
snapshot and one previous
snapshot:
--EXPLAIN ANALYSE VERBOSE
SELECT prev.*
FROM t_table AS prev
LEFT JOIN t_table AS cur ON (prev.c_id = cur.c_id) AND cur.c_date_called = 17
WHERE prev.c_date_called = 9
AND cur.c_id IS NULL
;
returns the data I expect when finding the IDs not present in the current c_date_called
:
c_id|c_date_called|
----+-------------+
2| 9|
4| 9|
But how do I apply the anti-join across multiple distinct c_date_called
collecting the results and merging them as compared against the current
c_date_called
?
Well. Window functions with anti-joins.. yeah need help.
Well to report the complete state of the is
s in the snapshots, i.e. a snapshot with a new id
not present in the previous snapshot resp. the id
removed, i.e. not present in the next snapshot, you do not need an anti-join. Which you do not use in your example anyway.
First define the sequence of the snapshots using integers and flag the current snapshot
with snap as (
select distinct c_date_called from t_table),
snap2 as (
select
c_date_called,
row_number() over(order by c_date_called) as snap_idx,
c_date_called = max(c_date_called) over() is_current
from snap)
select * from snap2;
c_date_called|snap_idx|is_current|
-------------+--------+----------+
9| 1|false |
12| 2|false |
17| 3|true |
Than join this supporting information to your main table and add two attribuites based on the lag
and lead
window function of the snapshot index that identify if the previous / next snapshot with the given id
is consecutive or if there is a gap. The logic should be pretty self-explained.
with snap as (
select distinct c_date_called from t_table),
snap2 as (
select
c_date_called,
row_number() over(order by c_date_called) as snap_idx,
c_date_called = max(c_date_called) over() is_current
from snap)
select
a.c_id, a.c_date_called, b.snap_idx, b.is_current,
snap_idx != (1 + lag(snap_idx,1,0) over(partition by a.c_id order by snap_idx)) is_new,
not (is_current) and snap_idx + 1 !=lead(snap_idx,1,0) over(partition by a.c_id order by snap_idx) is_removed
from t_table a
join snap2 b on a.c_date_called = b.c_date_called
order by 1,3;
c_id|c_date_called|snap_idx|is_current|is_new|is_removed|
----+-------------+--------+----------+------+----------+
1| 9| 1|false |false |false |
1| 12| 2|false |false |false |
1| 17| 3|true |false |false |
2| 9| 1|false |false |false |
2| 12| 2|false |false |true |
3| 9| 1|false |false |false |
3| 12| 2|false |false |false |
3| 17| 3|true |false |false |
4| 9| 1|false |false |true |
5| 9| 1|false |false |false |
5| 12| 2|false |false |true |
6| 12| 2|false |true |false |
6| 17| 3|true |false |false |
Note that I added id
6 that was introduced in the second snapshot to demonstrated this use case.