I have two separate data frames, where I'd like to add specific columns from the second data frame to the first one but I want their values to match specific rows in the first one, mainly whenever its says 'red' in the 'descriptor' column and also match for 'date' and 'product'.
Data frame 1:
date product descriptor
jan product1 blue
feb product2 green
mar product1 red
apr product2 blue
may product1 red
jun product2 red
Data frame 2:
date product val
mar product1 x
may product1 y
jun product2 z
I want to achieve something like this:
date product descriptor val
jan product1 blue
feb product2 green
mar product1 red x
apr product2 blue
may product1 red y
jun product2 red z
I've tried looking for something similar but couldn't find anything that exactly matches my question.
You can merge
, mask
and combine_first
:
out = df1.combine_first(df1.merge(df2.drop_duplicates(['date', 'product']),
on=['date', 'product'], how='left')
.where(df1['descriptor'].eq('red'))
)
Or, align with a MultiIndex:
cols = ['date', 'product']
df1.loc[m, ['val']] = (df2.set_index(cols)
.reindex(pd.MultiIndex.from_frame(df1[cols]))
.set_axis(df1.index)
.where(df1['descriptor'].eq('red'))
)
Or perform a classical merge
, then fix the values:
out = df1.merge(df2, how='left')
out.loc[df1['descriptor'].ne('red'), ['val']] = float('nan')
Output:
date descriptor product val
0 jan blue product1 NaN
1 feb green product2 NaN
2 mar red product1 x
3 apr blue product2 NaN
4 may red product1 y
5 jun red product2 z