pythonpandasindexing

Finding the first row that meets conditions of a mask starting from nth row


This is my dataframe:

import pandas as pd 
df = pd.DataFrame({'a': [20, 21, 100, 4, 100, 20], 'b': [20, 20, 20, 20, 20, 20]})

I want to create column c by using a mask. This is my desired output:

     a   b    c
0   20  20  NaN
1   21  20  NaN
2  100  20  NaN
3    4  20  NaN
4  100  20  x
5   20  20  NaN

My mask is:

mask = (df.a > df.b)

Note that I want to start looking for this mask from the third row. That is, rows 0, 1 and 2 do not count. That is why the first row that meets the a > b is the 5th row which its index is 4.

This is what I have tried. But I don't know how to start from the third row.

df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'x'

These are some additional examples. First three rows do not count. enter image description here


Solution

  • A robust solution that works independently of the index and if there is no match in the mask:

    # original mask
    mask = (df.a > df.b)
    
    # set first n values to False
    mask &= np.arange(len(df)) > 2
    
    # only keep first True (if any)
    mask &= ~mask.duplicated()
    
    df.loc[mask, 'c'] = 'x'
    

    Output:

         a   b    c
    0   20  20  NaN
    1   21  20  NaN
    2  100  20  NaN
    3    4  20  NaN
    4  100  20    x
    5   20  20  NaN