The original dataset is:
Group | Year | Value |
---|---|---|
A | 1990 | NaN |
A | 1992 | 1 |
A | 1995 | NaN |
A | 1997 | NaN |
A | 1998 | NaN |
A | 2001 | NaN |
A | 2002 | 1 |
B | 1991 | 1 |
B | 1992 | NaN |
B | 1995 | NaN |
B | 1998 | NaN |
B | 2001 | 1 |
B | 2002 | NaN |
I want to do forward fill by group and conditional on the value of column 'Year': forward fill missing value until the 'Year' is more than five years apart. For example, the value for group A in Year 1992 is 1, so the value for group A in 1995 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1997 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1998 should not be forward filled with 1 since 1998-1992=6 > 5.
The dataset I want is as follows:
Group | Year | Value |
---|---|---|
A | 1990 | NaN |
A | 1992 | 1 |
A | 1995 | 1 |
A | 1997 | 1 |
A | 1998 | NaN |
A | 2001 | NaN |
A | 2002 | 1 |
B | 1991 | 1 |
B | 1992 | 1 |
B | 1995 | 1 |
B | 1998 | NaN |
B | 2001 | 1 |
B | 2002 | 1 |
You can use a double groupby.ffill
and mask with where
:
# identify rows within 5 of the previous non-NA value
m = (df['Year'].where(df['Value'].notna())
.groupby(df['Group']).ffill()
.rsub(df['Year']).le(5)
)
# groupby.ffill and mask
df['Value'] = df.groupby('Group')['Value'].ffill().where(m)
Output:
Group Year Value
0 A 1990 NaN
1 A 1992 1.0
2 A 1995 1.0
3 A 1997 1.0
4 A 1998 NaN
5 A 2001 NaN
6 A 2002 1.0
7 B 1991 1.0
8 B 1992 1.0
9 B 1995 1.0
10 B 1998 NaN
11 B 2001 1.0
12 B 2002 1.0