pandasdataframemissing-dataffill

Dataframe forward-fill till column-specific last valid index


How do I go from:

[In]:   df = pd.DataFrame({
            'col1': [100, np.nan, np.nan, 100, np.nan, np.nan],
            'col2': [np.nan, 100, np.nan, np.nan, 100, np.nan]
        })
        df

[Out]:        col1    col2
        0      100     NaN
        1      NaN     100
        2      NaN     NaN
        3      100     NaN
        4      NaN     100
        5      NaN     NaN

To:

[Out]:        col1    col2
        0      100     NaN
        1      100     100
        2      100     100
        3      100     100
        4      NaN     100
        5      NaN     NaN

My current approach is a to apply a custom method that works on one column at a time:

[In]:   def ffill_last_valid(s):
            last_valid = s.last_valid_index()
            s = s.ffill()
            s[s.index > last_valid] = np.nan
            return s

        df.apply(ffill_last_valid)

But it seems like an overkill to me. Is there a one-liner that works on the dataframe directly?


Note on accepted answer:

See the accepted answer from mozway below.

I know it's a tiny dataframe but:

enter image description here


Solution

  • You can ffill, then keep only the values before the last stretch of NaN with a combination of where and notna/reversed-cummax:

    out = df.ffill().where(df[::-1].notna().cummax())
    

    variant:

    out = df.ffill().mask(df[::-1].isna().cummin())
    

    Output:

        col1   col2
    0  100.0    NaN
    1  100.0  100.0
    2  100.0  100.0
    3  100.0  100.0
    4    NaN  100.0
    5    NaN    NaN
    

    interpolate:

    In theory, df.interpolate(method='ffill', limit_area='inside') should work, but while both options work as expected separately, for some reason it doesn't when combined (pandas 1.5.2). This works with df.interpolate(method='zero', limit_area='inside'), though.