sqldatabaseoracle19coracle

Join two tables based on multiple column comparison


I have a country dimension table COUNTRY_DIM like below:

ID SHORT_NAME FULL_NAME ALTERNATE_NAME
US USA United States of America United States
UK UK United Kingdom Great Britain
IN India Republic of India
ROK South Korea Republic of Korea
DE Germany Federal Republic of Germany Deutschland

I have another table COUNTRY_FACT where I have Country Name and some values like below:

COUNTRY_NAME VALUE
United States 30
India 6
United Kingdom 10
South Korea 2
Republic of Korea 1
Germany 4
USA 2

I want to join both the tables on SHORT_NAME first, if not matched then FULL_NAME if not matched then ALTERNATE_NAME and derive the Country ID.

So the output will be like below :

COUNTRY_ID VALUE
US 32
IN 6
UK 10
ROK 3
DE 4

I am using the below query:

select dim.country_id,sum(fact.value) value
from contry_fact fact
inner join country_dim dim
on
case when upper(country_name) = upper(short_name) then 1
when upper(country_name) = upper(full_name) then 1
when upper(country_name) = upper(alternate_name) then 1
else 0 end = 1
group by dim.country_id

Is this the correct method to get the desired output or we can achieve it another way ?

I am using Oracle 19c


Solution

  • Don't try to join once when you have multiple candidate join columns or any kind of conditional logic in the join clause (like that CASE statement). It creates a performance headache and limits the join methods Oracle can use.

    Instead, join separately for each candidate join column, then use COALESCE in the order you want matching the priority to select the desired dimension:

    SELECT COALESCE(d1.country_id,d2.country_id,d3.country_id) country_id,
           SUM(f.value) value
      FROM country_fact f
           LEFT OUTER JOIN country_dim d1 ON UPPER(f.short_name) = UPPER(d1.country_name)
           LEFT OUTER JOIN country_dim d2 ON UPPER(f.full_name) = UPPER(d2.country_name)
           LEFT OUTER JOIN country_dim d3 ON UPPER(f.alternate_name) = UPPER(d3.country_name)
     GROUP BY COALESCE(d1.country_id,d2.country_id,d3.country_id)
           
    

    However that you're doing an aggregation suggests this is using an actual fact table rather than obtaining a dimension key to load a fact table from staging. If that's the case, you really shouldn't have text country strings in your fact table at all. It should already be replaced by the surrogate country_id at ETL time so this kind of query complexity by users isn't needed.