pythonfillna

I want to replace Nan values with row averages but first column( first value in the row) is a string


I want to replace Nans with the averages of the rows but the first value of the row is a string and therefore python can't calculate the row averages( axis=1) . How can I replace Nans with row averages and keep the first value as the name for that row?

Index    Country   1990 1995 2000 2005 2010 2015 
 0         US       5    6    Nan  9    19   11
 1        Germany   5    Nan  3    7    19    9
 .         ...      ..    ..   ..  ..   ..   ..
 ,         ...      ..    ..   ..  ..   ..   ..

I use something like this df.fillna(df.drop['country'] , axis=1).apply(lambda x: x.mean() ,axis=1). It doesn't replace Nans as fillna is not working. I don't have a problem replacing Nans with axis=0.


Solution

  • If data.csv would be like this:

    x,y,a,b,c,d
    0,"A",1,2,3,4
    1,"B",5,6,7,
    2,"C",9,,11,
    

    It looks like you were after something like this:

    from pandas import read_csv
    
    df = read_csv('data.csv')
    print(df)
    
    df = df.apply(lambda x: x.fillna(x[2:].mean()), axis=1)
    print(df)
    

    Output:

       x  y  a    b   c    d
    0  0  A  1  2.0   3  4.0
    1  1  B  5  6.0   7  NaN
    2  2  C  9  NaN  11  NaN
       x  y  a     b   c     d
    0  0  A  1   2.0   3   4.0
    1  1  B  5   6.0   7   6.0
    2  2  C  9  10.0  11  10.0
    

    It works because of the way the lambda ignores the first two columns:

    x.fillna(x[2:].mean())
    

    And it is applied row by row, so along axis 1.

    Note: Pandas decides to make the columns that have NaN floating point data and keeps the other integer type. Of course there's ways to fix that, but I decided to keep the answer simple.