In SQL I'm trying to combine multiple tables and grab the SUM of expenses per person, and sort those by highest total expense first. I have 3 tables:
I cannot edit test1 and test2 in the production environment. I created junction1 as a bridge to connect test1 and test2. I can modify columns/content in junction1. The IDs of test1 and test2 may change in the future (right now they are the same).
Desired result:
I need to do a full join on all tables, since I want to include all personnel from both tables. test1 and test2 are independent, as some people only shop in test1 locations and some only shop in test2 locations. Also to sort by Total SUM of both tables I tried:
ORDER BY SUM(Grocery1 + Grocery2) DESC
No luck.
A SUM select statement (no joins) works:
select junction1.Name1, SUM(Amount) AS Grocery1
from test1
FULL JOIN junction1 on junction1.ID1= test1.ID1
GROUP BY junction1.Name1 ORDER BY Grocery1 DESC;
But when I join the table(s):
select junction1.Name1, SUM(test1.Amount) AS Grocery1, SUM(test2.Amount) AS Grocery2
from test1
FULL JOIN junction1 ON test1.ID1 = junction1.ID1
FULL JOIN test2 ON test2.ID2 = junction1.ID2
GROUP BY junction1.Name1
It gives:
The data is off in both columns. Andy should only have $400 for Grocery1. It looks like it's multiplying it instead of adding it. I tried to divide by 3, which helps some of the people with 3 entries, but that's probably not what I want.
I don't think you want a full join. You would if both tables could be joined together directly, but the fact that we need to to through the junction table changes the situation.
We can start from the junction table, then union
both grocery tables (while separating the original columns) and bring them with a left join
on both possibles matches. The last step is aggregation:
select j.name1, sum(t.amount1) as grocery1, sum(t.amount2) as grocery2
from junction1 j
left join (
select id1, null as id2, amount as amount1, null as amount2 from test1
union all select null, id2, null, amount from test2
) t on t.id1 = j.id1 or t.id2 = j.id2
group by j.name1
Using union
avoids the "row multiplication" issue that you are seeing. On the other hand, the left join
ensures that records from both tables are preserved, regardless of whether or not the same person actually shopped at both locations.