I have these two tables in SQL:
CREATE TABLE myt1
(
name VARCHAR(50),
date DATE,
hours_watched DECIMAL(4,2)
);
INSERT INTO myt1 VALUES
('name1', '2024-01-15', 2.5),
('name1', '2024-03-05', 1.5),
('name1', '2024-06-10', 3.0),
('name2', '2024-01-20', 3.0),
('name2', '2024-04-15', 4.5),
('name3', '2024-02-10', 2.0),
('name3', '2024-07-15', 5.5),
('name4', '2024-12-01', 4.0);
CREATE TABLE myt2
(
name VARCHAR(50),
team VARCHAR(50),
subteam INTEGER,
date_from DATE,
date_to DATE
);
INSERT INTO myt2 VALUES
('name1', 'Team A', 1, '2024-01-01', '2024-02-29'),
('name1', 'Team B', 2, '2024-03-01', '2024-12-31'),
('name2', 'Team A', 2, '2024-01-01', '2024-12-31'),
('name3', 'Team B', 1, '2024-01-01', '2024-06-30');
Individuals on these teams watch movies. I want to find out how many total hours were spent watching movies for each team. I want to make sure that if someone is moving between teams, the hours get added to the team the person was on when he was watching.
I tried to do this with a simple join:
SELECT
t2.team,
SUM(t1.hours_watched) AS total_hours
FROM
myt1 t1
LEFT JOIN
myt2 t2 ON t1.name = t2.name
AND t1.date >= t2.date_from
AND t1.date <= t2.date_to
WHERE
t2.team IS NOT NULL
GROUP BY
t2.team;
But this will not work in the case that someone's team can not be located.
Is it better to solve this problem using CTEs for individuals that can be located/not located, or is there some easier option here?
SELECT
COALESCE(t2.team, 'UNMAPPED') AS team,
SUM(t1.hours_watched) AS total_hours
FROM myt1 t1
LEFT JOIN myt2 t2
ON t1.name = t2.name
AND t1.date >= t2.date_from
AND t1.date <= t2.date_to
GROUP BY COALESCE(t2.team, 'UNMAPPED');
null
teamCTEs are not "better specifically for this or that problem", they are just a general good way to help you split complex tasks in individual steps and not get lost.
But here your tasks does seem simple enough, and I'd says the error just lies in you forgetting to remove the WHERE t2.team IS NOT NULL
:
as you correctly used a LEFT JOIN
that precisely intends to get watched rows even when there's no match with a team, this post-filter looked surprising.
Of course just removing the filter will have returned you a null
team, but if you want to namely spot users having watched movies out of a team, you can either:
Simply COALESCE()
with t1.name
in both the SELECT
and GROUP BY
:
SELECT
COALESCE(t2.team, '('||t1.name||' personally)') AS team,
SUM(t1.hours_watched) AS total_hours
FROM
myt1 t1
LEFT JOIN
myt2 t2 ON t1.name = t2.name
AND t1.date >= t2.date_from
AND t1.date <= t2.date_to
GROUP BY
COALESCE(t2.team, '('||t1.name||' personally)');
Use a CTE:
WITH by_team_and_user AS
(
SELECT
COALESCE(t2.team, '('||t1.name||' personally)') AS team,
SUM(t1.hours_watched) AS total_hours
FROM
myt1 t1
LEFT JOIN
myt2 t2 ON t1.name = t2.name
AND t1.date >= t2.date_from
AND t1.date <= t2.date_to
GROUP BY
t2.team,
t1.name
)
SELECT team, SUM(total_hours) AS total_hours
FROM by_team_and_user
GROUP BY team;
Use a subquery:
SELECT team, SUM(total_hours) AS total_hours
FROM
(
SELECT
COALESCE(t2.team, '('||t1.name||' personally)') AS team,
SUM(t1.hours_watched) AS total_hours
FROM
myt1 t1
LEFT JOIN
myt2 t2 ON t1.name = t2.name
AND t1.date >= t2.date_from
AND t1.date <= t2.date_to
GROUP BY
t2.team,
t1.name
)
AS by_team_and_user
GROUP BY team;
Really it's up to you, all three of course return the same:
TEAM | TOTAL_HOURS |
---|---|
(name3 personally) | 5.50 |
(name4 personally) | 4.00 |
Team A | 10.00 |
Team B | 6.50 |
that you can see in this fiddle.