I have a dataframe with columns:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [False, True, False, False, False, False, True, True, False, True],
'B': [True, False, False, False, True, True, False, False, False, False ]
})
df
A B
0 False True
1 True False
2 False False
3 False False
4 False True
5 False True
6 True False
7 True False
8 False False
9 True False
How to identify and mark the first occurrence that has [True - False]
after encountering a [False - False]
value pair? Every row that satisfies this condition needs to be flagged in a new column.
In the example above, [3 False False]
is followed by [6 True False]
and also, [8 False False]
is followed by [9 True False]
.
These are the only valid solutions in this example.
You could use:
# identify start of group
m1 = df.eq([False, False]).all(axis=1)
# condition
m2 = df.eq([True, False]).all(axis=1)
# form groups
group = m1.cumsum()
# keep only rows with valid condition and after a start of group
# get the first value per group
idx = m2[m2 & (group>0)].groupby(group).idxmax().tolist()
# variant
# idx = m2.index.to_series()[m2 & (group>0)].groupby(group).first().tolist()
# assign flag
df.loc[idx, 'flag'] = 'X'
Output:
A B flag
0 False True NaN
1 True False NaN
2 False False NaN
3 False False NaN
4 False True NaN
5 False True NaN
6 True False X
7 True False NaN
8 False False NaN
9 True False X
Intermediates:
A B m1 m2 group flag
0 False True False False 0
1 True False False True 0
2 False False True False 1
3 False False True False 2
4 False True False False 2
5 False True False False 2
6 True False False True 2 X
7 True False False True 2
8 False False True False 3
9 True False False True 3 X
Variant without groupby
:
# identify start of groups
m1 = df.eq([False, False]).all(axis=1)
# condition
m2 = (df.eq([True, False]).all(axis=1)
& m1.cummax()
)
# form groups
group = m1.cumsum()
idx = group[m2].drop_duplicates().index
# assign flag
df.loc[idx, 'flag'] = 'X'