pandasdataframeforward-fill

Forward fill dataframe's missing years' rows


Hi I have a dataframe that looks as follows:

    ID  Year    Variable
0   1   1990    1
1   1   1995    10
2   1   1999    12
3   2   2000    3
4   2   2001    4
5   2   2010    12

I would like to generate all the missing years in between within each ID. And then forward fill the variable values. That is to say, the following output dataframe would be produced:

    ID  Year    Variable
0   1   1990    1
1   1   1991    1
2   1   1992    1
3   1   1993    1
4   1   1994    1
5   1   1995    10
6   1   1996    10
7   1   1997    10
8   1   1998    10
9   1   1999    12
10  2   2000    3
11  2   2001    4
12  2   2002    4
13  2   2003    4
14  2   2004    4
15  2   2005    4
16  2   2006    4
17  2   2007    4
18  2   2008    4
19  2   2009    4
20  2   2010    12

The dataframe I am dealing with is quite big, and I need the most efficient way to do this please.


Solution

  • Use Series.reindex with method='ffill' for forward filling missing values in lambda function:

    out = (df.set_index('Year')
             .groupby('ID')['Variable']
             .apply(lambda x: x.reindex(range(x.index.min(),x.index.max()+ 1), method='ffill'))
             .reset_index())
    print (out)
        ID  Year  Variable
    0    1  1990         1
    1    1  1991         1
    2    1  1992         1
    3    1  1993         1
    4    1  1994         1
    5    1  1995        10
    6    1  1996        10
    7    1  1997        10
    8    1  1998        10
    9    1  1999        12
    10   2  2000         3
    11   2  2001         4
    12   2  2002         4
    13   2  2003         4
    14   2  2004         4
    15   2  2005         4
    16   2  2006         4
    17   2  2007         4
    18   2  2008         4
    19   2  2009         4
    20   2  2010        12