pythonpandasmissing-datafillnaffill

Replace missing values based on value of a specific column in Python


I would like to replace missing values based on the values of the column Submitted.

Find below what I have:

Year Country Submitted Age12 Age14
2018 CHI 1 267 NaN
2019 CHI NaN NaN NaN
2020 CHI 1 244 203
2018 ALB 1 163 165
2019 ALB 1 NaN NaN
2020 ALB 1 161 NaN
2018 GER 1 451 381
2019 GER NaN NaN NaN
2020 GER 1 361 321

An this is what I would like to have:

Year Country Submitted Age12 Age14
2018 CHI 1 267 NaN
2019 CHI NaN 267 NaN
2020 CHI 1 244 203
2018 ALB 1 163 165
2019 ALB 1 NaN NaN
2020 ALB 1 161 NaN
2018 GER 1 451 381
2019 GER NaN 451 381
2020 GER 1 361 321

I tried using the command df.fillna(axis=0, method='ffill') But this replace all values NaN by the previous, but this is not what I want because some values should be kept as NaN if the "Submitted" column value is 1.

I would like to change the values by the previous row only if the respective "Submitted" value is "NaN".

Thank you


Solution

  • Try using where together with what you did:

     df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))
    

    This will replace the entries only when Submitted is null.