I'm trying to somehow convert an existing SQL statement to pandas. These are the dataframes I'm working with:
df_products:
ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY
1 P1 PRODUCT_P1 12 15 UPS
2 P2 PRODUCT_P2 4 3 DHL
3 P3 PRODUCT_P3 120 22 DHL
4 P1 PRODUCT_P1 423 18 UPS
5 P2 PRODUCT_P2 0 5 GLS
6 P3 PRODUCT_P3 53 10 DHL
7 P4 PRODUCT_P4 22 0 UPS
8 P1 PRODUCT_P1 94 56 GLS
9 P1 PRODUCT_P1 9 24 GLS
and
df_accessories:
ID ACCESSORY_ID NAME DEL_BY SUITABLE_FOR MANUFACTURER
100 A1 ACCESSORY_1 DHL P1 KUNG
101 A2 ACCESSORY_2 UPS P1 PAO
102 A3 ACCESSORY_3 GLS P1 PAO
103 A4 ACCESSORY_4 UPS P3 PAK
104 A5 ACCESSORY_5 DHL P2 PAK
I'm trying to apply the pandas version of this SQL query:
SELECT *
FROM products a
LEFT JOIN accessories b
ON b.DEL_BY = 'UPS'
AND a.PRODUCT_ID = b.SUITABLE_FOR
AND b.MANUFACTURER != 'PAK'
I tried to solve this like this:
joined = df_products.merge(df_accessories, left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left')
filtered = joined.loc[(joined['DEL_BY'] == 'UPS') & (joined['MANUFACTURER'] != 'PAK')]
But I don't think it works this way. I'm strugging already with the first ON b.DEL_BY = 'UPS' statement where I don't know where to put that in pandas merge function.
I'm expecting this result:
ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY ďťżID ACCESSORY_ID NAME.1 DEL_BY SUITABLE_FOR MANUFACTURER
0 1 P1 PRODUCT_P1 12 15 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
1 2 P2 PRODUCT_P2 4 3 DHL NaN NaN NaN NaN NaN NaN
2 3 P3 PRODUCT_P3 120 22 DHL NaN NaN NaN NaN NaN NaN
3 4 P1 PRODUCT_P1 423 18 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
4 5 P2 PRODUCT_P2 0 5 GLS NaN NaN NaN NaN NaN NaN
5 6 P3 PRODUCT_P3 53 10 DHL NaN NaN NaN NaN NaN NaN
6 7 P4 PRODUCT_P4 22 0 UPS NaN NaN NaN NaN NaN NaN
7 8 P1 PRODUCT_P1 94 56 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
8 9 P1 PRODUCT_P1 9 24 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
but I'm getting this instead:
ID_x PRODUCT_ID NAME_x STOCK SELL_COUNT DELIVERED_BY ID_y ACCESSORY_ID NAME_y DEL_BY SUITABLE_FOR MANUFACTURER
1 1 P1 PRODUCT_P1 12 15 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
6 4 P1 PRODUCT_P1 423 18 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
12 8 P1 PRODUCT_P1 94 56 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
15 9 P1 PRODUCT_P1 9 24 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
Thanks
You filter the right data frame before it goes into merge:
df_products.merge(df_accessories.query('DEL_BY == "UPS" and MANUFACTURER != "PAK"'),
left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left',
suffixes=('', '.1'))
The .query(...)
piece is equivalent to slicing the data frame:
cond = (df_accessories['DEL_BY'] == 'UPS') & (df_accessories['MANUFACTURER'] != 'PAK')
df_products.merge(df_accessories[cond], ...)