Below is the input data df1
A B C D E F G
Messi Forward Argentina 1 Nan 5 6
Ronaldo Defender Portugal Nan 4 Nan 3
Messi Midfield Argentina Nan 5 Nan 6
Ronaldo Forward Portugal 3 Nan 2 3
Mbappe Forward France 1 3 2 5
Below is the intended output
df
A B C D E F G
Messi Forward,Midfield Argentina 1 5 5 6
Ronaldo Forward,Defender Portugal 3 4 2 3
Mbappe Forward France 1 3 2 5
My try:
df.groupby(['A','C'])['B'].agg(','.join).reset_index()
df.fillna(method='ffill')
Do we have a better way to do this ?
You can get first non missing values per groups by all columns without A,C
and for B
aggregate by join
:
d = dict.fromkeys(df.columns.difference(['A','C']), 'first')
d['B'] = ','.join
df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d)
print (df1)
A C B D E F G
0 Messi Argentina Forward,Midfield 1.0 5.0 5.0 6
1 Ronaldo Portugal Defender,Forward 3.0 4.0 2.0 3
2 Mbappe France Forward 1.0 3.0 2.0 5
df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d).convert_dtypes()
print (df1)
A C B D E F G
0 Messi Argentina Forward,Midfield 1 5 5 6
1 Ronaldo Portugal Defender,Forward 3 4 2 3
2 Mbappe France Forward 1 3 2 5