pythonpandasnumpyffill

Forward fill only certain value


I have an array which represents object states, where 0 - object is off, and 1 - object is on.

import pandas as pd
import numpy as np

s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan]
df = pd.DataFrame(s, columns=["s"])
df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN

I need to forward will only 0-values in it, like below.

>>> df_wanted
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

After browsing similar queations here, I just compare ffill-ed and bfill-ed values and assign back with a mask:

mask = (df.ffill() == 0) & (df.bfill() == 1)
df[mask] = 0
df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

But it won't help if any 0 value is not followed by 1. What could be more elegant solution that takes such cases into account?


Solution

  • mask = (df.ffill() == 0) should only be suffice to fulfill your usecase.

    Firstly, df.ffill will propagate the last valid observation forward. So rows followed by 0 will be filled by 0s, and rows followed by 1 will be filled by 1s. Compare that to 0 to select rows with 0s only and use it as mask to get your final df.

    Example: (Added a 0 and few NaNs to the end of your df)

    >>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
    >>> df = pd.DataFrame(s, columns=["s"])
    >>> df
          s
    0   NaN
    1   0.0
    2   NaN
    3   NaN
    4   1.0
    5   NaN
    6   NaN
    7   0.0
    8   NaN
    9   1.0
    10  NaN
    11  NaN
    12  0.0
    13  NaN
    14  NaN
    15  NaN
    >>> 
    >>> 
    >>> df[df.ffill() == 0] = 0
    >>> df
          s
    0   NaN
    1   0.0
    2   0.0
    3   0.0
    4   1.0
    5   NaN
    6   NaN
    7   0.0
    8   0.0
    9   1.0
    10  NaN
    11  NaN
    12  0.0
    13  0.0
    14  0.0
    15  0.0