pythongroup-bycatplot

Problems with label names of a groupby.value_counts() object


first of all I apologize for my english and thanks for your time.

I´ve a problem with the labels from df or series to draw a catplot with seaborn.

I have a df like this (from a data that which was modified with pd.melt)

    cardio  variable    value
0   0   cholesterol 0
1   1   cholesterol 1
2   1   cholesterol 1
3   1   cholesterol 0
4   0   cholesterol 0
... ... ... ...
419995  0   overweight  1
419996  1   overweight  1
419997  1   overweight  1
419998  1   overweight  1
419999  0   overweight  0

And i need to draw a sns.catplot with that data grouped by 'cardio' and 'variable', and then counted by value. So, I wrote this code:

df_cat = df_cat.groupby(['cardio','variable']).value_counts()
df_cat2=df_cat.to_frame()

The problem is that its returns a df with 2 levels of labels (the top label its the '0') like this:

                                0
cardio  variable    value   
0       active         1    28643
                       0     6378
        alco           0    33080
                       1     1941
        cholesterol    0    29330
                       1     5691

As sns.catplot needs to use dataframe and correctly recognize column names, this '0' column is causing problems to create the catplot. I need to rename the columns names and remove this '0' label from the last df or name the counts column when i use groupby.value_counts() in the first df because I think that the '0' its created automatically since the 'counts' column has no name.

I expect something like this:

cardio  variable    value   count
0       active         1    28643
                       0    6378
        alco           0    33080
                       1    1941
        cholesterol    0    29330
                       1    5691

Solution

  • Value_counts returns a series with a multi-index. Just reset the index and rename the column--fake data added below in example.

    import pandas as pd
    import numpy as np
    
    n = 50
    cats = ['cholestoral', 'active', 'alco']
    
    data = {'cardio': np.random.randint(2, size=n), 
            'variable': np.random.choice(cats, size=n), 
            'value':np.random.randint(2, size=n)}
    
    df = pd.DataFrame.from_dict(data)
    
    df_plot = (df
        .value_counts(subset=['cardio','variable', 'value'])
        .reset_index()
        .rename(columns={0:'counts'})
    )