I am looking to combine rows in a DataFrame based on a grouping - but struggling to even get started with it.
For one field, I want to concatenate values. For another I want to sum.
Example:
col_a
and col_d
col_b
with a |
separatorcol_c
Input
col_a col_b col_c col_d
Alex Milk 5 UK
Alex Sugar 4 USA
David Rice 3 Spain
Alex Wheat 1 UK
Output:
col_a col_b col_c col_d
Alex Milk | Wheat 6 UK
Alex Sugar 4 USA
David Rice 3 Spain
You can use groupby.agg()
out = df.groupby(['col_a', 'col_d'], as_index=False).agg(
{
'col_b': lambda x: ' | '.join(x),
'col_c': 'sum'
}
)
print(out)
col_a col_d col_b col_c
0 Alex UK Milk | Wheat 6
1 Alex USA Sugar 4
2 David Spain Rice 3