[SOLVED] Problems with label names of a groupby.value

Problems with label names of a groupby.value_counts() object

first of all I apologize for my english and thanks for your time.

I´ve a problem with the labels from df or series to draw a catplot with seaborn.

I have a df like this (from a data that which was modified with pd.melt)

    cardio  variable    value
0   0   cholesterol 0
1   1   cholesterol 1
2   1   cholesterol 1
3   1   cholesterol 0
4   0   cholesterol 0
... ... ... ...
419995  0   overweight  1
419996  1   overweight  1
419997  1   overweight  1
419998  1   overweight  1
419999  0   overweight  0

And i need to draw a sns.catplot with that data grouped by 'cardio' and 'variable', and then counted by value. So, I wrote this code:

df_cat = df_cat.groupby(['cardio','variable']).value_counts()
df_cat2=df_cat.to_frame()

The problem is that its returns a df with 2 levels of labels (the top label its the '0') like this:

                                0
cardio  variable    value   
0       active         1    28643
                       0     6378
        alco           0    33080
                       1     1941
        cholesterol    0    29330
                       1     5691

As sns.catplot needs to use dataframe and correctly recognize column names, this '0' column is causing problems to create the catplot. I need to rename the columns names and remove this '0' label from the last df or name the counts column when i use groupby.value_counts() in the first df because I think that the '0' its created automatically since the 'counts' column has no name.

I expect something like this:

cardio  variable    value   count
0       active         1    28643
                       0    6378
        alco           0    33080
                       1    1941
        cholesterol    0    29330
                       1    5691

Solution

Value_counts returns a series with a multi-index. Just reset the index and rename the column--fake data added below in example.

import pandas as pd
import numpy as np

n = 50
cats = ['cholestoral', 'active', 'alco']

data = {'cardio': np.random.randint(2, size=n), 
        'variable': np.random.choice(cats, size=n), 
        'value':np.random.randint(2, size=n)}

df = pd.DataFrame.from_dict(data)

df_plot = (df
    .value_counts(subset=['cardio','variable', 'value'])
    .reset_index()
    .rename(columns={0:'counts'})
)