pythonpandasgroup-by

Calculate percentage of flag grouped by another column


I have the following dataframe:

Account ID Subscription type Cancellation flag
123 Basic 1
222 Basic 0
234 Hybrid 1
345 Hybrid 1

Now I would like to calculate the percentage of cancellations, but grouped by the subscription type. I would like to get it in a format so that I can easily create a bar chart out of the percentages grouped by the subscription type.


Solution

  • Use a groupby.mean:

    out = df.groupby('Subscription type')['Cancellation flag'].mean().mul(100)
    

    Output:

    Subscription type
    Basic      50.0
    Hybrid    100.0
    Name: Cancellation flag, dtype: float64
    

    Then plot.bar:

    out.plot.bar()
    

    enter image description here

    Or directly with seaborn.barplot:

    import seaborn as sns
    sns.barplot(df, x='Subscription type', y='Cancellation flag',
                estimator='mean', errorbar=None)
    

    Output:

    enter image description here