pythonpandasffill

Pandas Dataframe ffill with One Greater the Previous Nonzero Value


I have a pandas DataFrame with a column like:

0
1
1
2
2
3
4
5
5
0
0
0

I would like to leave any leading zeros, but ffill to replace the trailing zeros with one greater than the previous, nonzero value. In this case, I'd like the output to be:

0
1
1
2
2
3
4
5
5
6
6
6

How can I go about doing this?


Solution

  • You could mask, increment and ffill:

    m = df['col'].eq(0)
    s = df['col'].mask(m)
    df['out'] = s.fillna(s.add(1).ffill().fillna(0)).convert_dtypes()
    

    Or, if you really want to only target the trailing zeros:

    df['out'] = df['col'].mask(df['col'].eq(0)[::-1].cummin(), df['col'].max()+1)
    

    Output:

        col  out
    0     0    0
    1     1    1
    2     1    1
    3     2    2
    4     2    2
    5     3    3
    6     4    4
    7     5    5
    8     5    5
    9     0    6
    10    0    6
    11    0    6
    

    Intermediates (first approach):

        col  out      m    s  s.add(1)  .ffill()  .fillna(0)
    0     0    0   True  NaN       NaN       NaN         0.0
    1     1    1  False  1.0       2.0       2.0         2.0
    2     1    1  False  1.0       2.0       2.0         2.0
    3     2    2  False  2.0       3.0       3.0         3.0
    4     2    2  False  2.0       3.0       3.0         3.0
    5     3    3  False  3.0       4.0       4.0         4.0
    6     4    4  False  4.0       5.0       5.0         5.0
    7     5    5  False  5.0       6.0       6.0         6.0
    8     5    5  False  5.0       6.0       6.0         6.0
    9     0    6   True  NaN       NaN       6.0         6.0
    10    0    6   True  NaN       NaN       6.0         6.0
    11    0    6   True  NaN       NaN       6.0         6.0
    

    Intermediates (second approach):

        col  out      m    s  df['col'].eq(0)  [::-1].cummin()
    0     0    0   True  NaN             True            False
    1     1    1  False  1.0            False            False
    2     1    1  False  1.0            False            False
    3     2    2  False  2.0            False            False
    4     2    2  False  2.0            False            False
    5     3    3  False  3.0            False            False
    6     4    4  False  4.0            False            False
    7     5    5  False  5.0            False            False
    8     5    5  False  5.0            False            False
    9     0    6   True  NaN             True             True
    10    0    6   True  NaN             True             True
    11    0    6   True  NaN             True             True
    

    applying per group:

    Assuming a group LOT_ID and the target column STEP_NUMBER:

    df['out'] = (df.groupby('LOT_ID')['STEP_NUMBER']
                 .transform(lambda x: x.mask(x.eq(0)[::-1].cummin(), x.max()+1))
                )