pythonpandaspandas-merge

Insert/replace/merge values from one dataframe to another


I have two dataframes like this:

df1 = pd.DataFrame({'ID1':['A','B','C','D','E','F'],
                    'ID2':['0','10','80','0','0','0']})
df2 = pd.DataFrame({'ID1':['A','D','E','F'],
                    'ID2':['50','30','90','50'],
                    'aa':['1','2','3','4']})

df1,df2

I want to insert ID2 in df2 into ID2 in df1, and at the same time insert aa into df1 according to ID1 to obtain a new dataframe like this:

df_result = pd.DataFrame({'ID1':['A','B','C','D','E','F'],
                       'ID2':['50','10','80','30','90','50'],
                         'aa':['1','NaN','NaN','2','3','4']})

df_result

I've tried to use merge, but it didn't work.


Solution

  • You can use combine_first on the DataFrame after setting the index to ID1:

    (df2.set_index('ID1')  # values of df2 have priority in case of overlap
        .combine_first(df1.set_index('ID1')) # add missing values from df1
        .reset_index()     # reset ID1 as column
    )
    

    output:

      ID1 ID2   aa
    0   A  50    1
    1   B  10  NaN
    2   C  80  NaN
    3   D  30    2
    4   E  90    3
    5   F  50    4