I have a pandas DataFrame with a column like:
0
1
1
2
2
3
4
5
5
0
0
0
I would like to leave any leading zeros, but ffill to replace the trailing zeros with one greater than the previous, nonzero value. In this case, I'd like the output to be:
0
1
1
2
2
3
4
5
5
6
6
6
How can I go about doing this?
You could mask
, increment and ffill
:
m = df['col'].eq(0)
s = df['col'].mask(m)
df['out'] = s.fillna(s.add(1).ffill().fillna(0)).convert_dtypes()
Or, if you really want to only target the trailing zeros:
df['out'] = df['col'].mask(df['col'].eq(0)[::-1].cummin(), df['col'].max()+1)
Output:
col out
0 0 0
1 1 1
2 1 1
3 2 2
4 2 2
5 3 3
6 4 4
7 5 5
8 5 5
9 0 6
10 0 6
11 0 6
Intermediates (first approach):
col out m s s.add(1) .ffill() .fillna(0)
0 0 0 True NaN NaN NaN 0.0
1 1 1 False 1.0 2.0 2.0 2.0
2 1 1 False 1.0 2.0 2.0 2.0
3 2 2 False 2.0 3.0 3.0 3.0
4 2 2 False 2.0 3.0 3.0 3.0
5 3 3 False 3.0 4.0 4.0 4.0
6 4 4 False 4.0 5.0 5.0 5.0
7 5 5 False 5.0 6.0 6.0 6.0
8 5 5 False 5.0 6.0 6.0 6.0
9 0 6 True NaN NaN 6.0 6.0
10 0 6 True NaN NaN 6.0 6.0
11 0 6 True NaN NaN 6.0 6.0
Intermediates (second approach):
col out m s df['col'].eq(0) [::-1].cummin()
0 0 0 True NaN True False
1 1 1 False 1.0 False False
2 1 1 False 1.0 False False
3 2 2 False 2.0 False False
4 2 2 False 2.0 False False
5 3 3 False 3.0 False False
6 4 4 False 4.0 False False
7 5 5 False 5.0 False False
8 5 5 False 5.0 False False
9 0 6 True NaN True True
10 0 6 True NaN True True
11 0 6 True NaN True True
Assuming a group LOT_ID
and the target column STEP_NUMBER
:
df['out'] = (df.groupby('LOT_ID')['STEP_NUMBER']
.transform(lambda x: x.mask(x.eq(0)[::-1].cummin(), x.max()+1))
)