sqlhadoopdbvisualizer

SQL Hadoop - Combine two tables and sum


My goal is to combine two tables, where some IDs may be present in one table and not in another. If the ID and year are the same, then the amount should be summed.

Table 1:
ID   Year  Amount
111  2010  10.50
111  2011  5
123  2010  6
124  2010  8

Table 2:
ID  Year  Amount
111 2010  10
124 2011  10
125 2011  5

Output:
ID   Year  Amount
111  2010  20.50
111  2011  5
123  2010  6
124  2010  8
124  2011  10
125  2011  5

I was going to first UNION ALL both tables as such:

select *
from schema.table1
UNION ALL
select *
from schema.table2

But that gets me only a little close to my goal. Should I do case when statements, can I do an UNION ALL and a sum?

select ID, year, sum(a.year + b.year)
from schema.table1 a
UNION ALL
from schema.table2 b

Solution

  • You can use a subquery:

    select id, year, sum(amount)
    from ((select *
           from schema.table1
          ) union all
          (select *
           from schema.table2
          )
         ) t12
    group by id, year;
    

    You can also use a full join, but with logs of coalesce()s:

    select coalesce(t1.id, t2.id), coalesce(t1.year, t2.year),
           (coalesce(t1.amount, 0) + coalesce(t2.amount, 0))
    from schema.table1 t1 full join
         schema.table2 t2
         on t1.id = t2.id and t1.year = t2.year;