I want to replace Nans with the averages of the rows but the first value of the row is a string and therefore python can't calculate the row averages( axis=1) . How can I replace Nans with row averages and keep the first value as the name for that row?
Index Country 1990 1995 2000 2005 2010 2015
0 US 5 6 Nan 9 19 11
1 Germany 5 Nan 3 7 19 9
. ... .. .. .. .. .. ..
, ... .. .. .. .. .. ..
I use something like this df.fillna(df.drop['country'] , axis=1).apply(lambda x: x.mean() ,axis=1)
.
It doesn't replace Nans as fillna is not working. I don't have a problem replacing Nans with axis=0.
If data.csv
would be like this:
x,y,a,b,c,d
0,"A",1,2,3,4
1,"B",5,6,7,
2,"C",9,,11,
It looks like you were after something like this:
from pandas import read_csv
df = read_csv('data.csv')
print(df)
df = df.apply(lambda x: x.fillna(x[2:].mean()), axis=1)
print(df)
Output:
x y a b c d
0 0 A 1 2.0 3 4.0
1 1 B 5 6.0 7 NaN
2 2 C 9 NaN 11 NaN
x y a b c d
0 0 A 1 2.0 3 4.0
1 1 B 5 6.0 7 6.0
2 2 C 9 10.0 11 10.0
It works because of the way the lambda ignores the first two columns:
x.fillna(x[2:].mean())
And it is applied row by row, so along axis 1.
Note: Pandas decides to make the columns that have NaN
floating point data and keeps the other integer type. Of course there's ways to fix that, but I decided to keep the answer simple.