pythonpandasdataframestring-concatenationmethod-missing

Row wise concatenation and replacing nan with common column values


Below is the input data df1

A        B        C          D  E   F   G
Messi   Forward   Argentina  1  Nan 5   6
Ronaldo Defender  Portugal  Nan 4   Nan 3
Messi   Midfield  Argentina Nan 5   Nan 6
Ronaldo Forward   Portugal   3  Nan 2   3
Mbappe  Forward   France    1   3   2   5

Below is the intended output

df

A         B                   C         D   E   F   G
Messi   Forward,Midfield    Argentina   1   5   5   6
Ronaldo Forward,Defender    Portugal    3   4   2   3
Mbappe  Forward               France    1   3   2   5

My try:

df.groupby(['A','C'])['B'].agg(','.join).reset_index()
df.fillna(method='ffill')

Do we have a better way to do this ?


Solution

  • You can get first non missing values per groups by all columns without A,C and for B aggregate by join:

    d = dict.fromkeys(df.columns.difference(['A','C']), 'first')
    d['B'] = ','.join
    
    df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d)
    print (df1)
             A          C                 B    D    E    F  G
    0    Messi  Argentina  Forward,Midfield  1.0  5.0  5.0  6
    1  Ronaldo   Portugal  Defender,Forward  3.0  4.0  2.0  3
    2   Mbappe     France           Forward  1.0  3.0  2.0  5
    

    df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d).convert_dtypes()
    print (df1)
             A          C                 B  D  E  F  G
    0    Messi  Argentina  Forward,Midfield  1  5  5  6
    1  Ronaldo   Portugal  Defender,Forward  3  4  2  3
    2   Mbappe     France           Forward  1  3  2  5