I have these two tables:
CREATE TABLE my_table1 (
name VARCHAR(50),
var1 DATE,
var2 INT
);
INSERT INTO my_table1 (name, var1, var2) VALUES
('john', '2010-01-01', 94),
('john', '2010-01-04', 106),
('john', '2015-01-01', 99),
('alex', '2010-01-01', 96),
('alex', '2018-01-01', 96),
('sara', '2005-01-01', 94),
('sara', '2006-01-01', 90),
('tim', '1999-01-01', 101);
CREATE TABLE my_table2 (
name VARCHAR(50),
var3 DATE,
var4 CHAR(1)
);
INSERT INTO my_table2 (name, var3, var4) VALUES
('john', '2001-01-01', 'a'),
('john', '2002-01-01', 'b'),
('alex', '2021-01-01', 'c'),
('alex', '2022-01-01', 'd'),
('sara', '1999-01-01', 'e'),
('sara', '2023-01-01', 'f');
I am trying to answer this problem:
problem 1: for each name in my_table2, find the most recent row (based on date). join this row to my_table1. however, after the join - ensure that the date from my_table2 is greater than the date from my_table1 (if not, then remove). The final result should have the same number of rows as my_table1.
For problem 1, I tried to solve the problem like this:
# problem 1
SELECT t1.*, t2.*
FROM my_table1 t1
JOIN (
SELECT name, var3, var4
FROM (
SELECT name, var3, var4,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY var3 DESC) as rn
FROM my_table2
) tmp
WHERE rn = 1
) t2
ON t1.name = t2.name
WHERE t1.var1 < t2.var3;
The code partly produces the correct output, but I can't figure out how to include a NA row for John and Tim here
Can someone please show me how to do this correctly?
Thanks!
Note: I tried an approach with the COALESCE function. This produced a row for Tim, but it won't produce wont a row for John:
SELECT t1.*, COALESCE(t2.name, 'NA') as name, t2.var3, t2.var4
FROM my_table1 t1
LEFT JOIN (
SELECT name, var3, var4
FROM (
SELECT name, var3, var4,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY var3 DESC) as rn
FROM my_table2
) tmp
WHERE rn = 1
) t2
ON t1.name = t2.name
WHERE t1.var1 < COALESCE(t2.var3, '9999-12-31');
| for each name in my_table2, find the most recent row (based on date)
There are 3 distinct names in my_table2 so you need 3 most recent rows of data
Select *
From ( Select NAME, VAR3, VAR4,
Max(VAR3) Over(Partition By NAME) "LAST_DATE"
From my_table2 )
Where VAR3 = LAST_DATE
--
/* R e s u l t :
NAME VAR3 VAR4 LAST_DATE
------------------- -------- ---- ---------
alex 01.01.22 d 01.01.22
john 01.01.02 b 01.01.02
sara 01.01.23 f 01.01.23 */
| join this row to my_table1
| after the join - ensure that the date from my_table2 is greater than the date from my_table1
| (if not, then remove)
| The final result should have the same number of rows as my_table1
Theese instructions are contradictory. If you want the number of rows returned to be the same as number of rows in my_table1 then:
Select t1.*, t2.VAR3, t2.VAR4
From my_table1 t1
Left Join ( Select *
From ( Select NAME, VAR3, VAR4,
Max(VAR3) Over(Partition By NAME) as LAST_DATE
From my_table2
)
Where VAR3 = LAST_DATE
) t2 ON t2.name = t1.name And
t2.LAST_DATE > t1.VAR1
/* R e s u l t :
NAME VAR1 VAR2 VAR3 VAR4
------------------ ---------- ---------- ---------- -------
alex 01.01.10 96 01.01.22 d
alex 01.01.18 96 01.01.22 d
sara 01.01.05 94 01.01.23 f
sara 01.01.06 90 01.01.23 f
john 01.01.10 94
john 04.01.10 106
john 01.01.15 99
tim 01.01.99 101 */
But if the task is to remove not matching rows then you can:
Select t1.*, t2.VAR3, t2.VAR4
From my_table1 t1
Left Join ( Select *
From ( Select NAME, VAR3, VAR4,
Max(VAR3) Over(Partition By NAME) as LAST_DATE
From my_table2
)
Where VAR3 = LAST_DATE
) t2 ON t2.name = t1.name
Where t2.LAST_DATE > t1.VAR1
Both will remove john and tim rows from the result. First one within the join (ON clause) and second after the join (if that is what one of the instructions mean) using Where clause
/* R e s u l t :
NAME VAR1 VAR2 VAR3 VAR4
------------------ ---------- ---------- ---------- -------
alex 01.01.10 96 01.01.22 d
alex 01.01.18 96 01.01.22 d
sara 01.01.05 94 01.01.23 f
sara 01.01.06 90 01.01.23 f */