sqlnetezza

SQL: Comparing Identical Rows in Different Tables and then Updating


I have these two tables : old_table was created on 2020-01-01 and new_table was created on 2020-01-02.

CREATE TABLE old_table (
    name1 VARCHAR(50),
    name2 VARCHAR(50),
    origin_date DATE,
    var1 VARCHAR(50),
    today DATE
);


INSERT INTO old_table (name1, name2, origin_date, var1, today) VALUES
('red_1', 'red', '2010-01-01', 'aaa', '2020-01-01'),
('red_2', 'red', '2011-01-01', 'bbb', '2020-01-01'),
('blue_1', 'blue', '2005-01-01', 'ccc', '2020-01-01'),
('green_1', 'green', '2005-01-01', 'ddd', '2020-01-01');


CREATE TABLE new_table (
    name1 VARCHAR(50),
    name2 VARCHAR(50),
    origin_date DATE,
    var1 VARCHAR(50),
    today DATE
);


INSERT INTO new_table (name1, name2, origin_date, var1, today) VALUES
('purple_1', 'purple', '2001-01-01', 'fff', '2020-01-02'),
('pink_1', 'pink', '2002-01-01', 'ggg', '2020-01-02'),
('red_1', 'red', '2010-01-01', 'aaa', '2020-01-02');

I am trying to accomplish the following task:

As an example: in the final dataset (i.e. after the sql code) ... green would have a origin date of 2005-01-01 and have an end_date of 2020-01-02 .red_1 survived in both old_table and new_table. therefore, red_1 can not have an end_date and its status must still be active.

The final result should look like this:

     name1  name2 origin_date var1  status   end_date
    red_1    red  2010-01-01  aaa  active       <NA>
    red_2    red  2011-01-01  bbb inactive 2020-01-02
   blue_1   blue  2005-01-01  ccc inactive 2020-01-02
  green_1  green  2005-01-01  ddd inactive 2020-01-02
purple_1 purple  2001-01-01  fff  active       <NA>
  pink_1   pink  2002-01-01  ggg  active       <NA>

I tried to write the following code using CTEs:

WITH combined AS (
    SELECT 
        old_table.name1, 
        old_table.name2, 
        old_table.origin_date, 
        old_table.var1, 
        new_table.today AS end_date, 
        CASE WHEN new_table.name1 IS NULL THEN 'inactive' ELSE 'active' END AS status
    FROM 
        old_table
    LEFT JOIN 
        new_table ON old_table.name1 = new_table.name1
    UNION ALL
    SELECT 
        new_table.name1, 
        new_table.name2, 
        new_table.origin_date, 
        new_table.var1, 
        NULL AS end_date, 
        'active' AS status
    FROM 
        new_table
    WHERE 
        new_table.name1 NOT IN (SELECT name1 FROM old_table)
)
SELECT * FROM combined;

The code ran:

    name1  name2 origin_date var1 end_date   status
    red_1    red  2010-01-01  aaa       NA   active
    red_2    red  2011-01-01  bbb       NA inactive
   blue_1   blue  2005-01-01  ccc       NA inactive
  green_1  green  2005-01-01  ddd       NA inactive
 purple_1 purple  2001-01-01  fff       NA   active
   pink_1   pink  2002-01-01  ggg       NA   active

Problem: all the end_dates are NA - when some of them should be non NA (i.e. red_1, purple_1, pink_1).

Can someone please show me how to fix this?

Thanks!


Solution

  • Have you considered using full outer join? ref

    select
          coalesce(n.name1,o.name1) name1
        , coalesce(n.name2,o.name2) name2
        , coalesce(n.origin_date,o.origin_date) origin_date
        , coalesce(n.var1,o.var1) var1
        , case when n.name1 IS NULL then o.today end end_date
        , case when n.name1 IS NULL then 'inactive' else 'active' end status
    from new_table n
    full outer join old_table o on n.name1 = o.name1
    order by
          coalesce(n.origin_date,o.origin_date) DESC 
        , coalesce(n.name1,o.name1) 
    
    name1 name2 origin_date var1 end_date status
    red_2 red 2011-01-01 bbb 2020-01-01 inactive
    red_1 red 2010-01-01 aaa null active
    blue_1 blue 2005-01-01 ccc 2020-01-01 inactive
    green_1 green 2005-01-01 ddd 2020-01-01 inactive
    pink_1 pink 2002-01-01 ggg null active
    purple_1 purple 2001-01-01 fff null active

    fiddle