I have a pandas data frame with a column where a condition is met based on other elements in the data frame (not shown). Additionally, I have a column that extends the validness one row further with the following rule:
If a valid row is followed directly by ExtendsValid, that row is also valid, even if the underlying valid condition doesnt apply. Continue filling valid forward as long as ExtendsValid is 1
I have illustrated the result in column "FinalValid" (desired result. Doesnt have to be a new column, can also fill Valid forward). Note that rows 8 and 9 in the example also become valid. Also note that row 13 does NOT result in FinalValid, because you need a preceding valid row. Preceding valid row can be Valid or an extended valid row.
So far if I had a problem like that I did a cumbersome multi-step process:
I can provide sample code but I am really looking for a totally different, more efficient approach, which of course must be non-iterating as well.
Any ideas would be welcome.
# | Valid | ExtendsValid | FinalValid |
---|---|---|---|
1 | 0 | 0 | 0 |
2 | 1 | 0 | 1 |
3 | 0 | 0 | 0 |
4 | 0 | 0 | 0 |
5 | 1 | 0 | 1 |
6 | 0 | 0 | 0 |
7 | 1 | 0 | 1 |
8 | 0 | 1 | 1 |
9 | 0 | 1 | 1 |
10 | 0 | 0 | 0 |
11 | 1 | 0 | 1 |
12 | 0 | 0 | 0 |
13 | 0 | 1 | 0 |
14 | 0 | 0 | 0 |
IIUC, you want to ffill the 1s only if there is an uninterrupted series of 1s starting on Valid and eventually continuing on ExtendsValid.
For this you can use a groupby.cummin
:
df['FinalValid'] = (
(df['Valid']|df['ExtendsValid'])
.groupby(df['Valid'].cumsum())
.cummin()
)
Output:
NB. I slightly modified the input on row 3 to better illustrate the logic.
# Valid ExtendsValid FinalValid
0 1 0 0 0
1 2 1 0 1
2 3 0 0 0
3 4 0 1 0
4 5 1 0 1
5 6 0 0 0
6 7 1 0 1
7 8 0 1 1
8 9 0 1 1
9 10 0 0 0
10 11 1 0 1
11 12 0 0 0