I would like to replace missing values based on the values of the column Submitted.
Find below what I have:
Year | Country | Submitted | Age12 | Age14 |
---|---|---|---|---|
2018 | CHI | 1 | 267 | NaN |
2019 | CHI | NaN | NaN | NaN |
2020 | CHI | 1 | 244 | 203 |
2018 | ALB | 1 | 163 | 165 |
2019 | ALB | 1 | NaN | NaN |
2020 | ALB | 1 | 161 | NaN |
2018 | GER | 1 | 451 | 381 |
2019 | GER | NaN | NaN | NaN |
2020 | GER | 1 | 361 | 321 |
An this is what I would like to have:
Year | Country | Submitted | Age12 | Age14 |
---|---|---|---|---|
2018 | CHI | 1 | 267 | NaN |
2019 | CHI | NaN | 267 | NaN |
2020 | CHI | 1 | 244 | 203 |
2018 | ALB | 1 | 163 | 165 |
2019 | ALB | 1 | NaN | NaN |
2020 | ALB | 1 | 161 | NaN |
2018 | GER | 1 | 451 | 381 |
2019 | GER | NaN | 451 | 381 |
2020 | GER | 1 | 361 | 321 |
I tried using the command df.fillna(axis=0, method='ffill') But this replace all values NaN by the previous, but this is not what I want because some values should be kept as NaN if the "Submitted" column value is 1.
I would like to change the values by the previous row only if the respective "Submitted" value is "NaN".
Thank you
Try using where
together with what you did:
df = df.where(~df.Sumbitted.isnull(), df.fillna(axis=0, method='ffill'))
This will replace the entries only when Submitted
is null.