pythonpandasdataframepandas-merge

Adding specific values from one dataframe into another


I have two separate data frames, where I'd like to add specific columns from the second data frame to the first one but I want their values to match specific rows in the first one, mainly whenever its says 'red' in the 'descriptor' column and also match for 'date' and 'product'.

Data frame 1:

date   product    descriptor        
jan    product1   blue
feb    product2   green
mar    product1   red
apr    product2   blue
may    product1   red       
jun    product2   red        

Data frame 2:

date   product    val        
mar    product1   x
may    product1   y
jun    product2   z

I want to achieve something like this:

date   product    descriptor  val       
jan    product1   blue
feb    product2   green
mar    product1   red          x
apr    product2   blue
may    product1   red          y
jun    product2   red          z

I've tried looking for something similar but couldn't find anything that exactly matches my question.


Solution

  • You can merge, mask and combine_first:

    out = df1.combine_first(df1.merge(df2.drop_duplicates(['date', 'product']),
                                      on=['date', 'product'], how='left')
                               .where(df1['descriptor'].eq('red'))
                           )
    

    Or, align with a MultiIndex:

    cols = ['date', 'product']
    
    df1.loc[m, ['val']] = (df2.set_index(cols)
                              .reindex(pd.MultiIndex.from_frame(df1[cols]))
                              .set_axis(df1.index)
                              .where(df1['descriptor'].eq('red'))
                          )
    

    Or perform a classical merge, then fix the values:

    out = df1.merge(df2, how='left')
    out.loc[df1['descriptor'].ne('red'), ['val']] = float('nan')
    

    Output:

      date descriptor   product  val
    0  jan       blue  product1  NaN
    1  feb      green  product2  NaN
    2  mar        red  product1    x
    3  apr       blue  product2  NaN
    4  may        red  product1    y
    5  jun        red  product2    z