pythonpandasdataframepath-combine

How do I combine_first, indexing specific column two DataFrames into one?


After mapping by column c, If column A has a value, insert the value of column A; if not, insert column B.

data1                               data2

a    b    c                      a    c    d
a1   b1   c1                     1a   c1   1d   
     b2   c2                     2a   c2   2d
a3        c3                     3a   c3   3d
                                 4a   c4   4d

The result I want

  result
   a    b   c     
   a1   b1  c1
   2a   b2  c2
   a3       c3

I tried the following, but I was not satisfied.

->>> result = data1.merge(data2, on=['c'])
Prefixes _x and _y are created. combine_first is not applied.

->>> result = data1.combine_first(data2)
It is not mapped by column c.

How do I get good results? I ask for your help. thank you


Solution

  • Using @IdoS setup:

    import pandas as pd
    data1 = pd.DataFrame({'a': ['a1', None, 'a3'],
                          'b': ['b1', 'b2', None],
                          'c': ['c1', 'c2', 'c3']})
    
    data2 = pd.DataFrame({'a': ['1a', '2a', '3a', '4a'],
                          'c': ['c1', 'c2', 'c3', 'c4'],
                          'd': ['1d', '2d', '3d', '4d']})
    

    You can use set_index, combine_first, and reindex:

    df_out = data1.set_index('c').combine_first(data2.set_index('c'))\
         .reindex(data1.c)\
         .reset_index()
    
    df_out
    

    Output:

        c   a     b   d
    0  c1  a1    b1  1d
    1  c2  2a    b2  2d
    2  c3  a3  None  3d