Suppose I have a dataframe as follows:
Roll No | Name | School | Year | Total Marks | Previous Marks
------------------------------------------------------------------------
1001 | abc | iisr | 2005 | 595 | NaN
1002 | amr | iisd | 2005 | 599 | NaN
1001 | abc | iisr | 2006 | NaN | 580
1003 | def | hmms | 2005 | 600 | 575
1002 | amr | iisd | 2006 | NaN | 590
1004 | ghi | aaiisr | 2005 | 580 | NaN
1005 | jkl | ups | 2005 | 599 | 500
1001 | abc | iisr | 2007 | 597 | NaN
1002 | amr | iisd | 2007 | 600 | NaN
1005 | jkl | ups | 2006 | NaN | 503
1006 | mno | iisr | 2005 | NaN | 480
1001 | abc | iisr | 2008 | NaN | 575
1007 | pqr | kms | 2005 | NaN | 575
1002 | amr | iisd | 2008 | NaN | 499
I want to forward-fill the columns Total Marks
and Previous Marks
such that for a particular Roll No
, the null values get replaced with their previous occurrence.
For example, in the Total Marks
column, the Roll No 1001 has a value of 595
in 2005, a NaN
in 2006, 597
in 2007, and a NaN
in 2008. I want the null value present for 1001 in the year 2006 to be replaced by its previously filled score (i.e. score of 2005), and the null value of the year 2007 to be replaced with the score value in the year 2008. If previous value is not found, then let it be as NaN
. The same logic to be applied for Previous Marks
column.
My final dataframe after forward-filling in the above mentioned format should be as follows:
Roll No | Name | School | Year | Total Marks | Previous Marks
------------------------------------------------------------------------
1001 | abc | iisr | 2005 | 595 | NaN
1002 | amr | iisd | 2005 | 599 | NaN
1001 | abc | iisr | 2006 | 595 | 580
1003 | def | hmms | 2005 | 600 | 575
1002 | amr | iisd | 2006 | 599 | 590
1004 | ghi | aaiisr | 2005 | 580 | NaN
1005 | jkl | ups | 2005 | 599 | 500
1001 | abc | iisr | 2007 | 597 | 580
1002 | amr | iisd | 2007 | 600 | 590
1005 | jkl | ups | 2006 | 599 | 503
1006 | mno | iisr | 2005 | NaN | 480
1001 | abc | iisr | 2008 | 597 | 575
1007 | pqr | kms | 2005 | NaN | 575
1002 | amr | iisd | 2008 | 600 | 499
I referred several solutions in StackOverflow, GeeksForGeeks, and the pandas documentation also for the pandas ffil()
function, but wasn't of much help. Any ideas how this can be achieved?
Groupby, ffill should do I think.
df[['Total_Marks','Previous_Marks']]=df.groupby('Roll_No')[['Total_Marks','Previous_Marks']].fillna(method='ffill')