I have these two tables : old_table was created on 2020-01-01 and new_table was created on 2020-01-02.
CREATE TABLE old_table (
name1 VARCHAR(50),
name2 VARCHAR(50),
origin_date DATE,
var1 VARCHAR(50),
today DATE
);
INSERT INTO old_table (name1, name2, origin_date, var1, today) VALUES
('red_1', 'red', '2010-01-01', 'aaa', '2020-01-01'),
('red_2', 'red', '2011-01-01', 'bbb', '2020-01-01'),
('blue_1', 'blue', '2005-01-01', 'ccc', '2020-01-01'),
('green_1', 'green', '2005-01-01', 'ddd', '2020-01-01');
CREATE TABLE new_table (
name1 VARCHAR(50),
name2 VARCHAR(50),
origin_date DATE,
var1 VARCHAR(50),
today DATE
);
INSERT INTO new_table (name1, name2, origin_date, var1, today) VALUES
('purple_1', 'purple', '2001-01-01', 'fff', '2020-01-02'),
('pink_1', 'pink', '2002-01-01', 'ggg', '2020-01-02'),
('red_1', 'red', '2010-01-01', 'aaa', '2020-01-02');
I am trying to accomplish the following task:
As an example: in the final dataset (i.e. after the sql code) ... green would have a origin date of 2005-01-01 and have an end_date of 2020-01-02 .red_1 survived in both old_table and new_table. therefore, red_1 can not have an end_date and its status must still be active.
The final result should look like this:
name1 name2 origin_date var1 status end_date
red_1 red 2010-01-01 aaa active <NA>
red_2 red 2011-01-01 bbb inactive 2020-01-02
blue_1 blue 2005-01-01 ccc inactive 2020-01-02
green_1 green 2005-01-01 ddd inactive 2020-01-02
purple_1 purple 2001-01-01 fff active <NA>
pink_1 pink 2002-01-01 ggg active <NA>
I tried to write the following code using CTEs:
WITH combined AS (
SELECT
old_table.name1,
old_table.name2,
old_table.origin_date,
old_table.var1,
new_table.today AS end_date,
CASE WHEN new_table.name1 IS NULL THEN 'inactive' ELSE 'active' END AS status
FROM
old_table
LEFT JOIN
new_table ON old_table.name1 = new_table.name1
UNION ALL
SELECT
new_table.name1,
new_table.name2,
new_table.origin_date,
new_table.var1,
NULL AS end_date,
'active' AS status
FROM
new_table
WHERE
new_table.name1 NOT IN (SELECT name1 FROM old_table)
)
SELECT * FROM combined;
The code ran:
name1 name2 origin_date var1 end_date status
red_1 red 2010-01-01 aaa NA active
red_2 red 2011-01-01 bbb NA inactive
blue_1 blue 2005-01-01 ccc NA inactive
green_1 green 2005-01-01 ddd NA inactive
purple_1 purple 2001-01-01 fff NA active
pink_1 pink 2002-01-01 ggg NA active
Problem: all the end_dates are NA - when some of them should be non NA (i.e. red_1, purple_1, pink_1).
Can someone please show me how to fix this?
Thanks!
Have you considered using full outer join
? ref
select
coalesce(n.name1,o.name1) name1
, coalesce(n.name2,o.name2) name2
, coalesce(n.origin_date,o.origin_date) origin_date
, coalesce(n.var1,o.var1) var1
, case when n.name1 IS NULL then o.today end end_date
, case when n.name1 IS NULL then 'inactive' else 'active' end status
from new_table n
full outer join old_table o on n.name1 = o.name1
order by
coalesce(n.origin_date,o.origin_date) DESC
, coalesce(n.name1,o.name1)
name1 | name2 | origin_date | var1 | end_date | status |
---|---|---|---|---|---|
red_2 | red | 2011-01-01 | bbb | 2020-01-01 | inactive |
red_1 | red | 2010-01-01 | aaa | null | active |
blue_1 | blue | 2005-01-01 | ccc | 2020-01-01 | inactive |
green_1 | green | 2005-01-01 | ddd | 2020-01-01 | inactive |
pink_1 | pink | 2002-01-01 | ggg | null | active |
purple_1 | purple | 2001-01-01 | fff | null | active |