pythonpandasgroup-byconditional-statementsswitch-statement

Replace every value after the maximum value by 1 in each group


The objective is to assign 1s to any index in the group that is a higher value than the one retrieved from idxmax()

import numpy as np
import pandas as pd
df = pd.DataFrame({'id':[1, 1, 1, 2, 2, 2, 3, 3, 3], 'val':[1,np.NaN, 0, np.NaN, 1, 0, 1, 0, 0]})

   id  val
0   1  1.0
1   1  NaN
2   1  0.0
3   2  NaN
4   2  1.0
5   2  0.0
6   3  1.0
7   3  0.0
8   3  0.0

We can use idxmax() to get the index values for the highest value in each group

test = df.groupby('id')['val'].idxmax()

id
1    0
2    4
3    6

The objective is to transform the data to look as such (which is that every value in group that has a higher index than the one from idxmax() gets assigned a 1.

   id  val
0   1  1.0
1   1  1.0
2   1  1.0
3   2  NaN
4   2  1.0
5   2  1.0
6   3  1.0
7   3  1.0
8   3  1.0

This question does not necessarily need to be done with idxmax(). Open to any suggestions.


Solution

  • If i understand correctly the problem, you can use apply and np.where

    nd = df.groupby('id')['val'].idxmax().tolist()
    df['val'] = df.groupby('id')['val'].transform(lambda x: np.where(x.index>nd[x.name-1], 1, x))
    
    df
    
    Output:
    
        id  val
    0   1   1.0
    1   1   1.0
    2   1   1.0
    3   2   NaN
    4   2   1.0
    5   2   1.0
    6   3   1.0
    7   3   1.0
    8   3   1.0
    
    
    

    Considering the comment, it is probably best to have a dictionary in case the df.id column is not sequential:

    nd = {k:v for k,v in zip(df.id.unique(),df.groupby('id')['val'].idxmax().tolist())}
    
    df['val'] = df.groupby('id')['val'].transform(lambda x: np.where(x.index>nd[x.name], 1, x))
    

    (the whole thing is significantly slower than the solution offered by not_a_robot)