pythonpandasdataframegroup-by

Fill new dataframe column using conditions after groupby using more than 2 column and 2 rows


I have the following table. I need to group by Col1 , check if Col2 contains Y,

if yes then create new column Col4 and insert corresponding Col3 in all rows of Col4 if no then just coly Col3 in Col4

index Col1 Col2 Col3
0 1 X ABC
1 1 Y XX
2 1 X QW
3 2 X VB
4 2 X AY
5 3 X MM
6 3 X YY
7 3 Y XX

Desired table

------ Col1 Col2 Col3 New_Col
0 1 X ABC XX
1 1 Y XX XX
2 1 X QW XX
3 2 X VB VB
4 2 X AY AY
5 3 X MM XX
6 3 X YY XX
7 3 Y XX XX

Solution

  • Filter only Y rows by boolean indexing and use Series.map by Col1, last replace non matched values by Series.fillna:

    s = df[df['Col2'].eq('Y')].set_index('Col1')['Col3']
    
    df['Col4'] = df['Col1'].map(s).fillna(df['Col3'])
    print (df)
       index  Col1 Col2 Col3 Col4
    0      0     1    X  ABC   XX
    1      1     1    Y   XX   XX
    2      2     1    X   QW   XX
    3      3     2    X   VB   VB
    4      4     2    X   AY   AY
    5      5     3    X   MM   XX
    6      6     3    X   YY   XX
    7      7     3    Y   XX   XX
    

    If possible matched value is NaNs solution is modified:

    print (df)
       index  Col1 Col2 Col3
    0      0     1    X  ABC
    1      1     1    Y  NaN
    2      2     1    X   QW
    3      3     2    X   VB
    4      4     2    X   AY
    5      5     3    X   MM
    6      6     3    X   YY
    7      7     3    Y   XX
    
    
    s = df[df['Col2'].eq('Y')].set_index('Col1')['Col3']
    
    df['Col4'] = np.where(df['Col1'].isin(s.index), df['Col1'].map(s), df['Col3'])
    print (df)
      index  Col1 Col2 Col3 Col4
    0      0     1    X  ABC  NaN
    1      1     1    Y  NaN  NaN
    2      2     1    X   QW  NaN
    3      3     2    X   VB   VB
    4      4     2    X   AY   AY
    5      5     3    X   MM   XX
    6      6     3    X   YY   XX
    7      7     3    Y   XX   XX
    

    EDIT: Solution if multiple Y values per groups - replace random value per groups by DataFrameGroupBy.sample:

    m = df['Col2'].eq('Y')
    
    s = df[m].groupby(df['Col1']).sample(1).set_index('Col1')['Col3']
    df['Col4'] = df['Col3'].where(m).fillna(df['Col1'].map(s)).fillna(df['Col3'])
    
    print (df)
       index  Col1 Col2 Col3 Col4
    0      0     1    Y  ABC  ABC
    1      1     1    Y   XX   XX
    2      2     1    X   QW  ABC
    3      3     2    X   VB   VB
    4      4     2    X   AY   AY
    5      5     3    X   MM   XX
    6      6     3    X   YY   XX
    7      7     3    Y   XX   XX