Hi I have a dataframe that looks as follows:
ID Year Variable
0 1 1990 1
1 1 1995 10
2 1 1999 12
3 2 2000 3
4 2 2001 4
5 2 2010 12
I would like to generate all the missing years in between within each ID. And then forward fill the variable values. That is to say, the following output dataframe would be produced:
ID Year Variable
0 1 1990 1
1 1 1991 1
2 1 1992 1
3 1 1993 1
4 1 1994 1
5 1 1995 10
6 1 1996 10
7 1 1997 10
8 1 1998 10
9 1 1999 12
10 2 2000 3
11 2 2001 4
12 2 2002 4
13 2 2003 4
14 2 2004 4
15 2 2005 4
16 2 2006 4
17 2 2007 4
18 2 2008 4
19 2 2009 4
20 2 2010 12
The dataframe I am dealing with is quite big, and I need the most efficient way to do this please.
Use Series.reindex
with method='ffill'
for forward filling missing values in lambda function:
out = (df.set_index('Year')
.groupby('ID')['Variable']
.apply(lambda x: x.reindex(range(x.index.min(),x.index.max()+ 1), method='ffill'))
.reset_index())
print (out)
ID Year Variable
0 1 1990 1
1 1 1991 1
2 1 1992 1
3 1 1993 1
4 1 1994 1
5 1 1995 10
6 1 1996 10
7 1 1997 10
8 1 1998 10
9 1 1999 12
10 2 2000 3
11 2 2001 4
12 2 2002 4
13 2 2003 4
14 2 2004 4
15 2 2005 4
16 2 2006 4
17 2 2007 4
18 2 2008 4
19 2 2009 4
20 2 2010 12