pythonpandasfillna

Why can't Series.fillna() fill all NaN values?


I want to fill the NaNs in a dataframe with random values:

df1 = pd.DataFrame(
    list(zip(
        ['0001', '0001', '0002', '0003', '0004', '0004'],
        ['a', 'b', 'a', 'b', 'a', 'b'],
        ['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
        [np.nan, np.nan, 'Jan', np.nan, np.nan, 'Jan'],
        [1,2,3,4,5,6])),
    columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1

Out:

    sample ID   compound    country month   value
0   0001        a           USA     NaN     1
1   0001        b           USA     NaN     2
2   0002        a           USA     Jan     3
3   0003        b           USA     NaN     4
4   0004        a           USA     NaN     5
5   0004        b           USA     Jan     6

I slice the database based on the compound column:

df2 = df1.loc[df1.compound == 'a']
df2

Out:

    sample ID  compound   country month   value
0   0001       a          USA     NaN     1
2   0002       a          USA     Jan     3
4   0004       a          USA     NaN     5

Then I tried to fillna with non-repeated values using filler:

from numpy.random import default_rng

rng = default_rng()
filler = rng.choice(len(df2.month), size=len(df2.month), replace=False)
filler = pd.Series(-abs(filler))

df2.month.fillna(filler, inplace=True)
df2

Out:

    sample ID    compound    country month   value
0   0001         a           USA     -1.0    1
2   0002         a           USA     Jan     3
4   0004         a           USA     NaN     5

I expected no NaN in the out but actually not, Why?


Solution

  • Problem is that your filler index is different from df2, since df2 is part of df1 by boolean indexing, you can do

    filler = pd.Series(-abs(filler)).set_axis(df2.index)
    df2['month'].fillna(filler, inplace=True)