sqlnetezza

SQL: Counting Number of Occurrences Past a Certain Date


I have these two tables:

CREATE TABLE my_table1 (
    name VARCHAR(50),
    var1 DATE,
    var2 INT
);


INSERT INTO my_table1 (name, var1, var2) VALUES
('john', '2010-01-01', 94),
('john', '2010-01-04', 106),
('john', '2015-01-01', 99),
('alex', '2010-01-01', 96),
('alex', '2018-01-01', 96),
('sara', '2005-01-01', 94),
('sara', '2006-01-01', 90),
('tim',  '1999-01-01', 101);

CREATE TABLE my_table2 (
    name VARCHAR(50),
    var3 DATE,
    var4 CHAR(1)
);

INSERT INTO my_table2 (name, var3, var4) VALUES
('john', '2001-01-01', 'a'),
('john', '2002-01-01', 'b'),
('alex', '2021-01-01', 'c'),
('alex', '2022-01-01', 'd'),
('sara', '1999-01-01', 'e'),
('sara', '2023-01-01', 'f');

I am trying to answer this question :

I was not sure how to proceed:

SELECT 
    t1.*, 
    t2.var3, 
    t2.var4, 
    COUNT(t2.name) OVER (PARTITION BY t1.name, t1.var1) as count
FROM 
    my_table1 t1
INNER JOIN 
    my_table2 t2 
ON 
    t1.name = t2.name AND 
    EXTRACT(YEAR FROM t1.var1) = EXTRACT(YEAR FROM t2.var3) AND 
    t2.var3 > t1.var1;

The final answer should look like this:

 name       date var2 count
 alex 2010-01-01   96     2
 alex 2018-01-01   96     2
 sara 2005-01-01   94     1
 sara 2006-01-01   90     1
 john 2010-01-01   94     0
 john 2010-01-04  106     0
 john 2015-01-01   99     0
  tim 1999-01-01  101     0

Is this the correct way to work on these problems?


Solution

  • problem: full join my_table1 and my_table2 based on name. after the join, only keep rows where the date from my_table2 is greater than the date from my_table1. when this is done (i.e. in the joined file), for each row from my_table1, count how many rows from my table_2 were retained

    To only keep the most recent row the easiest way is to 1: do a UNION of the two tables, and then 2: do a nested select to get only the top dates. And 3: finally group these by name to get the count.

    SELECT name, max(var1), var2 from (SELECT t1.name as name,t1.var1 as var1,t1.var2 as var2, count(name) FROM t1 union select t2.name as name, t2.var3 as var1, t2.var4 as var2 from t2) group by name
    

    That should work. UNION connects the two tables as one so you can select from both tables as one. Unsure if it would be possible to do the max() command directly in the first select, so to make sure I used a nested SELECT for that.