pythonpandasffill

How can I forward fill a dataframe column where the limit of rows filled is based on the value of a cell in another column?


So I am trying to forward fill a column with the limit being the value in another column. This is the code I run and I get this error message.

import pandas as pd
import numpy as np

df = pd.DataFrame()

df['NM'] = [0, 0, 1, np.nan, np.nan, np.nan, 0]

df['length'] = [0, 0, 2, 0, 0, 0, 0]

print(df)

   NM      length
0  0.0       0
1  0.0       0
2  1.0       2
3  NaN       0
4  NaN       0
5  NaN       0
6  0.0       0

df['NM'] = df['NM'].fillna(method='ffill', limit=df['length'])

print(df)

ValueError: Limit must be an integer

The dataframe I want looks like this:

       NM      length
    0  0.0       0
    1  0.0       0
    2  1.0       2
    3  1.0       0
    4  1.0       0
    5  NaN       0
    6  0.0       0

Thanks in advance for any help you can provide!


Solution

  • You can first group the dataframe by the length column before filling. Only issue is that for the first group in your example limit would be 0 which causes an error, so we can make sure it's at least 1 with max. This might cause unexpected results if there are nan values before the first non-zero value in length but from the given data it's not clear if that can happen.

    # make groups
    m = df.length.gt(0).cumsum()
    
    # fill the column
    df["NM"] = df.groupby(m).apply(
                        lambda f: f.NM.fillna(
                        method="ffill", 
                        limit=max(f.length.iloc[0], 1))
    ).values