python-3.xpandaspandas-groupby

Pandas dataframe replace column value based on group


I have a dataframe with the below structure,

   master_mac    slave_mac        uuid           rawData               
0  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                         
1  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                         
2  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                          
3  ac233fc01403  ac233f26492b     e2c56db5       ac0228  
4  ac233fc01403  e464eecba5eb     NaN            590080             
5  ac233fc01403  ac233f26492b     e2c56db5       ac0228  
6  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                          
7  ac233fc01403  ac233f26492b     e2c56db5       636800       

The resultant outcome needs to be,

 master_mac    slave_mac        uuid           rawData               
0  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                         
1  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                         
2  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                          
3  ac233fc01403  ac233f26492b     e2c56db5       NaN  
4  ac233fc01403  e464eecba5eb     NaN            590080             
5  ac233fc01403  ac233f26492b     e2c56db5       NaN  
6  ac233fc01403  ac233f26492b     e2c56db5       NaN                                                          
7  ac233fc01403  ac233f26492b     e2c56db5       NaN

Can anyone help me out in this?


Solution

  • Use:

    m = df['uuid'].notna()
    

    If need processes per groups use GroupBy.transform with GroupBy.any for test at least one non NaN per groups:

    m = df['uuid'].notna().groupby([df['master_mac'],df['slave_mac']]).transform('any')
    
    df['rawData'] = df['rawData'].mask(m)
    print (df)
         master_mac     slave_mac      uuid rawData
    0  ac233fc01403  ac233f26492b  e2c56db5     NaN
    1  ac233fc01403  ac233f26492b  e2c56db5     NaN
    2  ac233fc01403  ac233f26492b  e2c56db5     NaN
    3  ac233fc01403  ac233f26492b  e2c56db5     NaN
    4  ac233fc01403  e464eecba5eb       NaN  590080
    5  ac233fc01403  ac233f26492b  e2c56db5     NaN
    6  ac233fc01403  ac233f26492b  e2c56db5     NaN
    7  ac233fc01403  ac233f26492b  e2c56db5     NaN
    

    Or:

    df.loc[m, 'rawData'] = np.nan