postgresqltime-seriesanti-join

Collectiong Anti-Join Results via Window Function?


I have a table:

CREATE TABLE t_table (
    c_id int4 NOT NULL,
    c_date_called int4 NOT NULL,
    CONSTRAINT t_table_un UNIQUE (c_id, c_date_called)
);

that stores distinct snapshots of data, with data as such:

INSERT INTO t_table (c_id, c_date_called) 
   VALUES
(1,9),
(2,9),
(3,9),
(4,9),
(5,9),
(1,12),
(2,12),
(3,12),
(5,12),
(1,17),
(3,17)
;

Against this table I can run an anti-join, comparing the current snapshot and one previous snapshot:

--EXPLAIN ANALYSE VERBOSE
SELECT prev.*
    FROM t_table AS prev
    LEFT JOIN t_table AS cur ON (prev.c_id = cur.c_id) AND cur.c_date_called = 17
    WHERE prev.c_date_called = 9
        AND cur.c_id IS NULL
        ;

returns the data I expect when finding the IDs not present in the current c_date_called:

c_id|c_date_called|
----+-------------+
   2|            9|
   4|            9|

But how do I apply the anti-join across multiple distinct c_date_called collecting the results and merging them as compared against the current c_date_called?

Well. Window functions with anti-joins.. yeah need help.


Solution

  • Well to report the complete state of the iss in the snapshots, i.e. a snapshot with a new id not present in the previous snapshot resp. the id removed, i.e. not present in the next snapshot, you do not need an anti-join. Which you do not use in your example anyway.

    First define the sequence of the snapshots using integers and flag the current snapshot

    with snap as (
    select distinct c_date_called from t_table),
    snap2 as (
    select  
      c_date_called,
      row_number() over(order by c_date_called) as snap_idx,
      c_date_called = max(c_date_called) over() is_current
    from snap)
    select * from snap2;
    
    c_date_called|snap_idx|is_current|
    -------------+--------+----------+
                9|       1|false     |
               12|       2|false     |
               17|       3|true      |
    

    Than join this supporting information to your main table and add two attribuites based on the lag and lead window function of the snapshot index that identify if the previous / next snapshot with the given id is consecutive or if there is a gap. The logic should be pretty self-explained.

    with snap as (
    select distinct c_date_called from t_table),
    snap2 as (
    select  
      c_date_called,
      row_number() over(order by c_date_called) as snap_idx,
      c_date_called = max(c_date_called) over() is_current
    from snap)
    select 
     a.c_id, a.c_date_called, b.snap_idx, b.is_current,
     snap_idx != (1 + lag(snap_idx,1,0) over(partition by a.c_id order by snap_idx)) is_new,
     not (is_current) and snap_idx + 1 !=lead(snap_idx,1,0) over(partition by a.c_id order by snap_idx) is_removed
    from t_table a
    join snap2 b on a.c_date_called = b.c_date_called
    order by 1,3;
    
    c_id|c_date_called|snap_idx|is_current|is_new|is_removed|
    ----+-------------+--------+----------+------+----------+
       1|            9|       1|false     |false |false     |
       1|           12|       2|false     |false |false     |
       1|           17|       3|true      |false |false     |
       2|            9|       1|false     |false |false     |
       2|           12|       2|false     |false |true      |
       3|            9|       1|false     |false |false     |
       3|           12|       2|false     |false |false     |
       3|           17|       3|true      |false |false     |
       4|            9|       1|false     |false |true      |
       5|            9|       1|false     |false |false     |
       5|           12|       2|false     |false |true      |
       6|           12|       2|false     |true  |false     |
       6|           17|       3|true      |false |false     |
    

    Note that I added id 6 that was introduced in the second snapshot to demonstrated this use case.