I have a dataframe
df = pd.DataFrame({
"species":["cat","dog","dog","cat","cat"],
"weight":[5,4,3,7,None],
"length":[12,None,13,14,15],
})
species weight length
0 cat 5.0 12.0
1 dog 4.0 NaN
2 dog 3.0 13.0
3 cat 7.0 14.0
4 cat NaN 15.0
and I want to fill the missing data with the average for the species, i.e.,
df.loc[1,"length"] = 13 # the average dog length
df.loc[4,"weight"] = 6 # (5+7)/2 the average cat weight
How do I do that?
(presumably I need to pass value=DataFrame
to df.fillna
, but I don't see an easy way to construct the frame)
df.fillna(df.groupby('species').transform('mean'))
which returns
species weight length
0 cat 5.0 12.0
1 dog 4.0 13.0
2 dog 3.0 13.0
3 cat 7.0 14.0
4 cat 6.0 15.0