I have a dataframe with a group column and a values column:
df = pd.DataFrame({'group': ['CA', 'WA', 'CO', 'AZ', 'MA'] * 10,
'value': pd.Series(range(5) + [np.nan]).sample(50, replace=True)})
How can I use groupby on the group column to get the proportion of NaNs in the value column?
The following should do:
df.groupby('group').apply(lambda x: x.value.isnull().sum()/len(x))
The key here is to use the .isnull method of a Series object to get rows that NaN and then a simple proportion calculation to get your desired output.
group
AZ 0.3
CA 0.1
CO 0.1
MA 0.1
WA 0.1
dtype: float6
I hope this proves helpful.