[SOLVED] Finding the first row that meets conditions of a mask starting from nth row

Finding the first row that meets conditions of a mask starting from nth row

This is my dataframe:

import pandas as pd 
df = pd.DataFrame({'a': [20, 21, 100, 4, 100, 20], 'b': [20, 20, 20, 20, 20, 20]})

I want to create column c by using a mask. This is my desired output:

     a   b    c
0   20  20  NaN
1   21  20  NaN
2  100  20  NaN
3    4  20  NaN
4  100  20  x
5   20  20  NaN

My mask is:

mask = (df.a > df.b)

Note that I want to start looking for this mask from the third row. That is, rows 0, 1 and 2 do not count. That is why the first row that meets the a > b is the 5th row which its index is 4.

This is what I have tried. But I don't know how to start from the third row.

df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'x'

These are some additional examples. First three rows do not count.

Solution

A robust solution that works independently of the index and if there is no match in the mask:

# original mask
mask = (df.a > df.b)

# set first n values to False
mask &= np.arange(len(df)) > 2

# only keep first True (if any)
mask &= ~mask.duplicated()

df.loc[mask, 'c'] = 'x'

Output:

     a   b    c
0   20  20  NaN
1   21  20  NaN
2  100  20  NaN
3    4  20  NaN
4  100  20    x
5   20  20  NaN