sqlpandasjoinleft-join

How to left join with multiple conditions in pandas?


I'm trying to somehow convert an existing SQL statement to pandas. These are the dataframes I'm working with:

df_products:

ID  PRODUCT_ID        NAME  STOCK  SELL_COUNT DELIVERED_BY        
1         P1  PRODUCT_P1     12          15          UPS  
2         P2  PRODUCT_P2      4           3          DHL  
3         P3  PRODUCT_P3    120          22          DHL  
4         P1  PRODUCT_P1    423          18          UPS  
5         P2  PRODUCT_P2      0           5          GLS  
6         P3  PRODUCT_P3     53          10          DHL  
7         P4  PRODUCT_P4     22           0          UPS  
8         P1  PRODUCT_P1     94          56          GLS  
9         P1  PRODUCT_P1      9          24          GLS

and

df_accessories:

ID ACCESSORY_ID         NAME DEL_BY SUITABLE_FOR MANUFACTURER
100           A1  ACCESSORY_1    DHL           P1         KUNG
101           A2  ACCESSORY_2    UPS           P1          PAO
102           A3  ACCESSORY_3    GLS           P1          PAO
103           A4  ACCESSORY_4    UPS           P3          PAK
104           A5  ACCESSORY_5    DHL           P2          PAK

I'm trying to apply the pandas version of this SQL query:

SELECT *
FROM products a
LEFT JOIN accessories b
    ON b.DEL_BY = 'UPS'
    AND a.PRODUCT_ID = b.SUITABLE_FOR
    AND b.MANUFACTURER != 'PAK'

I tried to solve this like this:

joined = df_products.merge(df_accessories, left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left')
filtered = joined.loc[(joined['DEL_BY'] == 'UPS') & (joined['MANUFACTURER'] != 'PAK')]

But I don't think it works this way. I'm strugging already with the first ON b.DEL_BY = 'UPS' statement where I don't know where to put that in pandas merge function.

I'm expecting this result:

   ID PRODUCT_ID        NAME  STOCK  SELL_COUNT DELIVERED_BY  ďťżID ACCESSORY_ID       NAME.1 DEL_BY SUITABLE_FOR MANUFACTURER
0   1         P1  PRODUCT_P1     12          15          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
1   2         P2  PRODUCT_P2      4           3          DHL    NaN          NaN          NaN    NaN          NaN          NaN
2   3         P3  PRODUCT_P3    120          22          DHL    NaN          NaN          NaN    NaN          NaN          NaN
3   4         P1  PRODUCT_P1    423          18          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
4   5         P2  PRODUCT_P2      0           5          GLS    NaN          NaN          NaN    NaN          NaN          NaN
5   6         P3  PRODUCT_P3     53          10          DHL    NaN          NaN          NaN    NaN          NaN          NaN
6   7         P4  PRODUCT_P4     22           0          UPS    NaN          NaN          NaN    NaN          NaN          NaN
7   8         P1  PRODUCT_P1     94          56          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
8   9         P1  PRODUCT_P1      9          24          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO

but I'm getting this instead:

    ID_x PRODUCT_ID      NAME_x  STOCK  SELL_COUNT DELIVERED_BY   ID_y ACCESSORY_ID       NAME_y DEL_BY SUITABLE_FOR MANUFACTURER
1      1         P1  PRODUCT_P1     12          15          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
6      4         P1  PRODUCT_P1    423          18          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
12     8         P1  PRODUCT_P1     94          56          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
15     9         P1  PRODUCT_P1      9          24          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO

Thanks


Solution

  • You filter the right data frame before it goes into merge:

    df_products.merge(df_accessories.query('DEL_BY == "UPS" and MANUFACTURER != "PAK"'),
                      left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left',
                      suffixes=('', '.1'))
    

    The .query(...) piece is equivalent to slicing the data frame:

    cond = (df_accessories['DEL_BY'] == 'UPS') & (df_accessories['MANUFACTURER'] != 'PAK')
    df_products.merge(df_accessories[cond], ...)