sqlnetezza

SQL; Ensuring a Condition is Met After the Join


I have these two tables:

CREATE TABLE my_table1 (
    name VARCHAR(50),
    var1 DATE,
    var2 INT
);


INSERT INTO my_table1 (name, var1, var2) VALUES
('john', '2010-01-01', 94),
('john', '2010-01-04', 106),
('john', '2015-01-01', 99),
('alex', '2010-01-01', 96),
('alex', '2018-01-01', 96),
('sara', '2005-01-01', 94),
('sara', '2006-01-01', 90),
('tim',  '1999-01-01', 101);

CREATE TABLE my_table2 (
    name VARCHAR(50),
    var3 DATE,
    var4 CHAR(1)
);

INSERT INTO my_table2 (name, var3, var4) VALUES
('john', '2001-01-01', 'a'),
('john', '2002-01-01', 'b'),
('alex', '2021-01-01', 'c'),
('alex', '2022-01-01', 'd'),
('sara', '1999-01-01', 'e'),
('sara', '2023-01-01', 'f');

I am trying to answer this problem:

problem 1: for each name in my_table2, find the most recent row (based on date). join this row to my_table1. however, after the join - ensure that the date from my_table2 is greater than the date from my_table1 (if not, then remove). The final result should have the same number of rows as my_table1.

For problem 1, I tried to solve the problem like this:

# problem 1

SELECT t1.*, t2.*
FROM my_table1 t1
JOIN (
    SELECT name, var3, var4
    FROM (
        SELECT name, var3, var4,
            ROW_NUMBER() OVER (PARTITION BY name ORDER BY var3 DESC) as rn
        FROM my_table2
    ) tmp
    WHERE rn = 1
) t2
ON t1.name = t2.name
WHERE t1.var1 < t2.var3;

The code partly produces the correct output, but I can't figure out how to include a NA row for John and Tim here

Can someone please show me how to do this correctly?

Thanks!

Note: I tried an approach with the COALESCE function. This produced a row for Tim, but it won't produce wont a row for John:

SELECT t1.*, COALESCE(t2.name, 'NA') as name, t2.var3, t2.var4
FROM my_table1 t1
LEFT JOIN (
    SELECT name, var3, var4
    FROM (
        SELECT name, var3, var4,
            ROW_NUMBER() OVER (PARTITION BY name ORDER BY var3 DESC) as rn
        FROM my_table2
    ) tmp
    WHERE rn = 1
) t2
ON t1.name = t2.name
WHERE t1.var1 < COALESCE(t2.var3, '9999-12-31');

Solution

  • | for each name in my_table2, find the most recent row (based on date)
    There are 3 distinct names in my_table2 so you need 3 most recent rows of data

    Select    *
    From    ( Select    NAME,  VAR3, VAR4, 
                        Max(VAR3) Over(Partition By NAME) "LAST_DATE"
              From      my_table2 )
    Where     VAR3 = LAST_DATE
    --
    /*    R e s u l t :
    NAME                VAR3     VAR4 LAST_DATE
    ------------------- -------- ---- ---------
    alex                01.01.22 d    01.01.22
    john                01.01.02 b    01.01.02
    sara                01.01.23 f    01.01.23  */
    

    | join this row to my_table1
    | after the join - ensure that the date from my_table2 is greater than the date from my_table1
    | (if not, then remove)
    | The final result should have the same number of rows as my_table1
    Theese instructions are contradictory. If you want the number of rows returned to be the same as number of rows in my_table1 then:

    Select      t1.*, t2.VAR3, t2.VAR4
    From        my_table1 t1
    Left  Join  ( Select    *
                  From    ( Select    NAME, VAR3, VAR4, 
                                      Max(VAR3) Over(Partition By NAME) as LAST_DATE
                            From      my_table2 
                          )
                  Where     VAR3 = LAST_DATE
                ) t2 ON t2.name = t1.name And
                        t2.LAST_DATE > t1.VAR1
    
    /*    R e s u l t :
    NAME               VAR1           VAR2   VAR3       VAR4
    ------------------ ---------- ---------- ---------- -------
    alex               01.01.10           96 01.01.22   d
    alex               01.01.18           96 01.01.22   d
    sara               01.01.05           94 01.01.23   f
    sara               01.01.06           90 01.01.23   f
    john               01.01.10           94           
    john               04.01.10          106           
    john               01.01.15           99           
    tim                01.01.99          101                    */
    

    But if the task is to remove not matching rows then you can:

    1. put INNER JOIN instead of Left Join.
    2. leave left join and move date condition from join's ON clause to query's Where clause
    Select      t1.*, t2.VAR3, t2.VAR4
    From        my_table1 t1
    Left  Join  ( Select    *
                  From    ( Select    NAME, VAR3, VAR4, 
                                      Max(VAR3) Over(Partition By NAME) as LAST_DATE
                            From      my_table2 
                          )
                  Where     VAR3 = LAST_DATE
                ) t2 ON t2.name = t1.name 
    Where t2.LAST_DATE > t1.VAR1
    

    Both will remove john and tim rows from the result. First one within the join (ON clause) and second after the join (if that is what one of the instructions mean) using Where clause

    /*    R e s u l t :
    NAME               VAR1           VAR2   VAR3       VAR4
    ------------------ ---------- ---------- ---------- -------
    alex               01.01.10           96 01.01.22   d
    alex               01.01.18           96 01.01.22   d
    sara               01.01.05           94 01.01.23   f
    sara               01.01.06           90 01.01.23   f          */