pythonpandasgroup-byconditional-statementsffill

How to forward fill by group and conditional on other columns in python


The original dataset is:

Group Year Value
A 1990 NaN
A 1992 1
A 1995 NaN
A 1997 NaN
A 1998 NaN
A 2001 NaN
A 2002 1
B 1991 1
B 1992 NaN
B 1995 NaN
B 1998 NaN
B 2001 1
B 2002 NaN

I want to do forward fill by group and conditional on the value of column 'Year': forward fill missing value until the 'Year' is more than five years apart. For example, the value for group A in Year 1992 is 1, so the value for group A in 1995 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1997 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1998 should not be forward filled with 1 since 1998-1992=6 > 5.

The dataset I want is as follows:

Group Year Value
A 1990 NaN
A 1992 1
A 1995 1
A 1997 1
A 1998 NaN
A 2001 NaN
A 2002 1
B 1991 1
B 1992 1
B 1995 1
B 1998 NaN
B 2001 1
B 2002 1

Solution

  • You can use a double groupby.ffill and mask with where:

    # identify rows within 5 of the previous non-NA value
    m = (df['Year'].where(df['Value'].notna())
         .groupby(df['Group']).ffill()
         .rsub(df['Year']).le(5)
        )
    
    # groupby.ffill and mask
    df['Value'] = df.groupby('Group')['Value'].ffill().where(m)
    

    Output:

       Group  Year  Value
    0      A  1990    NaN
    1      A  1992    1.0
    2      A  1995    1.0
    3      A  1997    1.0
    4      A  1998    NaN
    5      A  2001    NaN
    6      A  2002    1.0
    7      B  1991    1.0
    8      B  1992    1.0
    9      B  1995    1.0
    10     B  1998    NaN
    11     B  2001    1.0
    12     B  2002    1.0