pythonpandaspython-2.7

Python2: pandas groupby get proportion of NaN in each group


I have a dataframe with a group column and a values column:

df = pd.DataFrame({'group': ['CA', 'WA', 'CO', 'AZ', 'MA'] * 10,
                   'value': pd.Series(range(5) + [np.nan]).sample(50, replace=True)})

How can I use groupby on the group column to get the proportion of NaNs in the value column?


Solution

  • The following should do:

    df.groupby('group').apply(lambda x: x.value.isnull().sum()/len(x))
    

    The key here is to use the .isnull method of a Series object to get rows that NaN and then a simple proportion calculation to get your desired output.

    group
    AZ    0.3
    CA    0.1
    CO    0.1
    MA    0.1
    WA    0.1
    dtype: float6
    

    I hope this proves helpful.