sqldb2

Summing between dates


I have these two tables in SQL:

CREATE TABLE myt1 
(
    name VARCHAR(50),
    date DATE,
    hours_watched DECIMAL(4,2)
);

INSERT INTO myt1 VALUES
('name1', '2024-01-15', 2.5),
('name1', '2024-03-05', 1.5),
('name1', '2024-06-10', 3.0),
('name2', '2024-01-20', 3.0),
('name2', '2024-04-15', 4.5),
('name3', '2024-02-10', 2.0),
('name3', '2024-07-15', 5.5),
('name4', '2024-12-01', 4.0);

CREATE TABLE myt2 
(
    name VARCHAR(50),
    team VARCHAR(50),
    subteam INTEGER,
    date_from DATE,
    date_to DATE
);

INSERT INTO myt2 VALUES
('name1', 'Team A', 1, '2024-01-01', '2024-02-29'),
('name1', 'Team B', 2, '2024-03-01', '2024-12-31'),
('name2', 'Team A', 2, '2024-01-01', '2024-12-31'),
('name3', 'Team B', 1, '2024-01-01', '2024-06-30');

Individuals on these teams watch movies. I want to find out how many total hours were spent watching movies for each team. I want to make sure that if someone is moving between teams, the hours get added to the team the person was on when he was watching.

I tried to do this with a simple join:

SELECT 
    t2.team,
    SUM(t1.hours_watched) AS total_hours
FROM 
    myt1 t1
LEFT JOIN 
    myt2 t2 ON t1.name = t2.name 
            AND t1.date >= t2.date_from 
            AND t1.date <= t2.date_to
WHERE 
    t2.team IS NOT NULL
GROUP BY 
    t2.team;

But this will not work in the case that someone's team can not be located.

Is it better to solve this problem using CTEs for individuals that can be located/not located, or is there some easier option here?


SELECT 
    COALESCE(t2.team, 'UNMAPPED') AS team,
    SUM(t1.hours_watched) AS total_hours
FROM myt1 t1
LEFT JOIN myt2 t2 
    ON t1.name = t2.name 
    AND t1.date >= t2.date_from 
    AND t1.date <= t2.date_to
GROUP BY COALESCE(t2.team, 'UNMAPPED');

Solution

  • No need if you only need a null team

    CTEs are not "better specifically for this or that problem", they are just a general good way to help you split complex tasks in individual steps and not get lost.

    But here your tasks does seem simple enough, and I'd says the error just lies in you forgetting to remove the WHERE t2.team IS NOT NULL:
    as you correctly used a LEFT JOIN that precisely intends to get watched rows even when there's no match with a team, this post-filter looked surprising.

    Up to you if you want to personalize hours watched out of a team

    Of course just removing the filter will have returned you a null team, but if you want to namely spot users having watched movies out of a team, you can either:

    1. Simply COALESCE() with t1.name in both the SELECT and GROUP BY:

      SELECT 
          COALESCE(t2.team, '('||t1.name||' personally)') AS team,
          SUM(t1.hours_watched) AS total_hours
      FROM 
          myt1 t1
      LEFT JOIN 
          myt2 t2 ON t1.name = t2.name 
                  AND t1.date >= t2.date_from 
                  AND t1.date <= t2.date_to
      GROUP BY 
          COALESCE(t2.team, '('||t1.name||' personally)');
      
    2. Use a CTE:

      WITH by_team_and_user AS
      (
          SELECT 
              COALESCE(t2.team, '('||t1.name||' personally)') AS team,
              SUM(t1.hours_watched) AS total_hours
          FROM 
              myt1 t1
          LEFT JOIN 
              myt2 t2 ON t1.name = t2.name 
                      AND t1.date >= t2.date_from 
                      AND t1.date <= t2.date_to
          GROUP BY
              t2.team,
              t1.name
      )
      SELECT team, SUM(total_hours) AS total_hours
      FROM by_team_and_user
      GROUP BY team;
      
    3. Use a subquery:

      SELECT team, SUM(total_hours) AS total_hours
      FROM
      (
          SELECT 
              COALESCE(t2.team, '('||t1.name||' personally)') AS team,
              SUM(t1.hours_watched) AS total_hours
          FROM 
              myt1 t1
          LEFT JOIN 
              myt2 t2 ON t1.name = t2.name 
                      AND t1.date >= t2.date_from 
                      AND t1.date <= t2.date_to
          GROUP BY
              t2.team,
              t1.name
      )
      AS by_team_and_user
      GROUP BY team;
      

    Really it's up to you, all three of course return the same:

    TEAM TOTAL_HOURS
    (name3 personally) 5.50
    (name4 personally) 4.00
    Team A 10.00
    Team B 6.50

    that you can see in this fiddle.