python-3.xpandasdataframe

Possible to merge rows in Pandas Dataframe and Concat defined fields


I am looking to combine rows in a DataFrame based on a grouping - but struggling to even get started with it.

For one field, I want to concatenate values. For another I want to sum.

Example:

Input

col_a    col_b    col_c    col_d
Alex     Milk     5        UK
Alex     Sugar    4        USA
David    Rice     3        Spain
Alex     Wheat    1        UK

Output:

col_a    col_b          col_c    col_d
Alex     Milk | Wheat   6        UK
Alex     Sugar          4        USA
David    Rice           3        Spain

Solution

  • You can use groupby.agg()

    out = df.groupby(['col_a', 'col_d'], as_index=False).agg(
        {
            'col_b': lambda x: ' | '.join(x),  
            'col_c': 'sum'                    
        }
    )
    
    
    print(out)
    
     col_a     col_d         col_b       col_c
    0   Alex     UK       Milk | Wheat      6
    1   Alex    USA           Sugar         4
    2  David  Spain           Rice          3