sqlhiveleft-joinimpala

How to avoid left join to return increased number of records than actual


I am performing left join and it's giving me increased number of records.

table 1 has 29 records

table 2 has 2 records

after left join I am getting 31 records instead of 29.

How can I get 29 records

table 1 has 29 records table 2 has 2 records

output is giving me 31 records using left join

Here is the query.

table1.a1 table1.a2, table1.a3, table1.a4, table2.b1, table2.b2, table2.b3,

case when table1.a1 is not null then round(table1.a1,2), table2.b1 is not null then round(table2.b1,2) else 0 end as amount

from transaction table1 left join account table2

on table1.a1=table2.b1 where table1.external_status in ('A','F')

I am expecting 29 records from table1


Solution

  • You need to deduplicate data from tabel2 before joining using a CTE or subquery.

    I used your example SQL and modified using a subquery. You can use distinct as well.

    table1.a1 table1.a2, table1.a3, table1.a4, table2.b1, table2.b2, table2.b3,
    case when table1.a1 is not null then round(table1.a1,2), table2.b1 is not null then round(table2.b1,2) else 0 end as amount
    from transaction table1 left join 
    (select b1,b2,b3, row_number() over(partition by b1 order by b2,b3) as rownum from account) table2
    on table1.a1=table2.b1 and table2.rownum=1 where table1.external_status in ('A','F')
    

    row_number() will assign number to each row for b1 data like 1,2,3 if there are dups. Now when you do rownum=1, you eliminate duplicates.