pythonsqlpandasjoinpandas-merge

Convert SQL query that uses CASE expression as a JOIN key to pandas


SELECT a.year, a.country, b.amount
FROM table_a a
LEFT JOIN table_b b
ON a.country=b.country
AND (CASE WHEN b.country = 'Europe' THEN b.year = 2022 ELSE b.year = 2023 END)

I am trying to transform this code into python. What I have already tried is using pd.merge() for the left join but unsure how to proceed with using case when in join condition of pandas merge.

Given the following input:

table_a = pd.DataFrame({
    'country': ['Europe', 'Europe', 'USA', 'Africa'],
    'year': [2022, 2020, 2023, 2021]
})
table_b = pd.DataFrame({
    'country': ['Europe', 'USA', 'Africa', 'USA', 'Europe'],
    'year': [2023, 2022, 2022, 2023, 2022],
    'amount': [10, 20, 30, 40, 50]
})

the output should be:

  country  year  amount
0  Europe  2022    50.0
1  Europe  2020    50.0
2     USA  2023    40.0
3  Africa  2021     NaN

How do I make it work?


Solution

  • Since the condition is used to filter table_b before merging, we can go ahead and filter table_b first using the condition and merge later on country column.

    output = (
        table_a.merge(
            table_b.query("(country == 'Europe' and year == 2022) or (country != 'Europe' and year == 2023)"), 
            on=['country'], how='left', suffixes=('', '_'))
        [['year', 'country', 'amount']]
    )
    

    Using Nick's example, the above code produces the following output:

      country  year  amount
    0  Europe  2022    50.0
    1     USA  2023    40.0
    2  Africa  2021     NaN