pythonseabornfacet-gridplot-annotationscountplot

How to annotate bars with 0 counts when there's no data for the category


I have code using seaborn catplot, to draw categorical plots onto a FacetGrid. I am using a countplot in the catplot function, hence am using kind='count'. The col argument in the catplot is set to the col_cat variable, which in this context is defined as age_category. age_category is a column in my df, which as its name suggests, represents age categories. This is an ordered pandas categorical dtype.

My df is as follows:

ipdb> df
                         spirometryResult_category     age_category habits-smoking
_id                                                                               
63bb97708e5f58ef85f6e4ea                    Normal  20-39 years old            Yes
63bd1b228e5f58ef85f73130                    Normal  20-39 years old            Yes
6423cb1c174e67af0aa0f0fc                    Normal  20-39 years old             No
6423d85e174e67af0aa10cda               Restrictive  20-39 years old             No
6423d8bb174e67af0aa10d98               Obstructive  20-39 years old             No
...                                            ...              ...            ...
6549a0df0941d048fdfd94c4               Obstructive  20-39 years old             No
6549d0ab0941d048fdfd960d                    Normal  40-59 years old             No
6549d0ee0941d048fdfd962b                    Normal  20-39 years old             No
654b17a20941d048fdfda256                    Normal  20-39 years old             No
654d81700941d048fdfdc27d                    Normal  40-59 years old             No

[106 rows x 3 columns]

The age_category column in df is as follows:

ipdb> df['age_category']
_id
63bb97708e5f58ef85f6e4ea    20-39 years old
63bd1b228e5f58ef85f73130    20-39 years old
6423cb1c174e67af0aa0f0fc    20-39 years old
6423d85e174e67af0aa10cda    20-39 years old
6423d8bb174e67af0aa10d98    20-39 years old
                                 ...       
6549a0df0941d048fdfd94c4    20-39 years old
6549d0ab0941d048fdfd960d    40-59 years old
6549d0ee0941d048fdfd962b    20-39 years old
654b17a20941d048fdfda256    20-39 years old
654d81700941d048fdfdc27d    40-59 years old
Name: age_category, Length: 106, dtype: category
Categories (4, object): ['20-39 years old' < '40-59 years old' < '60-79 years old' < '>= 80 years old']

The distribution of categories in the age_category column is as follows:

ipdb> df['age_category'].value_counts()
age_category
20-39 years old    89
40-59 years old    14
60-79 years old     3
>= 80 years old     0
Name: count, dtype: int64

The number of subjects in the age category of '>= 80 years old' is 0, which gives me problems in plotting its annotations for the bars.

In general, the code which is below works. My objective is to plot multiple subplots, one for each age category, showing the subject counts for each combination of spirometryResult_category and habits-smoking.

    # Getting colours as specified in the config, for each hue category
    # Need to remove this hardcoding when i improve script
    colour_map =  config['seaborn_colourmaps'][hue_cat]

    # Plotting graph
    # count refers to param_category counts
    plt.subplots(figsize=figsize)
    # Not sure why setting axes.labelsize here doesnt
    # work
    sns.set_context('paper', rc={'font.size':fontsize})
    # height=4, aspect=.6,
    g = sns.catplot(
        data=df, x=param_category, hue=hue_cat, col=col_cat,
        kind='count', palette=colour_map, col_wrap=wrap_num,
        saturation=1
    )

    for ax in g.axes: 
        ax.tick_params(left=False, labelbottom=True)
        ax.set_xticklabels(ax.get_xticklabels(), size=fontsize)
        # Replacing subplot title if needed
        if col_cat in config['seaborn_alt_names']:
            new_title = config['seaborn_alt_names'][col_cat]
            ax.set_title( ax.get_title().replace(col_cat, new_title), size=fontsize)
        # Auto-label bars
        for container in ax.containers:
            container.datavalues = np.nan_to_num(container.datavalues)
            ax.bar_label(container, fmt='%.0f', padding=2)

    # In contrast to prev plotting code, despine goes here, as facetgrid
    # requires it to be done this way
    g.despine(top=True, right=True, left=True)
    # Fine adjustment of aesthetics    
    g.set(yticklabels=[], ylabel=None, xlabel=None)
    g.tick_params('x', rotation=90)
    # Checking if legend title is needed
    legend = False
    if 'legend' in plot_info:
        legend = plot_info['legend']
    if not legend:
        g.get_legend().set_title(None)
    else:
        # If an alternative legend title is specified,
        # use that, if not, use the default one
        if hue_cat in config['seaborn_alt_names']:
            new_title = config['seaborn_alt_names'][hue_cat]
            g.legend.set_title(new_title)
    # Continuing adjustment of aesthetics
    plt.subplots_adjust(hspace=1, wspace=0.3)
    g.figure.savefig(filename, bbox_inches='tight')
    plt.close()

The output picture is show here:

spirometry subplots with age categories and smoking status

As you can see, the category of ">= 80 years old" has no subjects, hence for its corresponding subplots, the text "0" is not plotted at all. All other age categories have their corresponding bars and annotations created correctly. For this case, where ">= 80 years old" has no subjects, ax.containers is an empty list, therefore my for loop using for container in ax.containers: to annotate cases with 0 counts, does not work.

How do I force seaborn to annotate subplots with 0 counts, in the correct location (automatically decided by seaborn so i dont have to hardcode anything), in this case, where the category has 0 subjects, and ax.containers is an empty list?


Solution

  • import seaborn as sns
    
    # sample data
    df = sns.load_dataset('titanic')
    
    # add categories
    df['age_cat'] = pd.cut(x=df.age, bins=range(0, 91, 10), ordered=True)
    
    # remove unused categories
    df['age_cat'] = df['age_cat'].cat.remove_unused_categories()
    
    g = sns.catplot(kind='count', data=df, x='embark_town', hue='sex', col='age_cat', col_wrap=3, height=2.5, aspect=2)
    
    axes = g.axes.flat
    
    for ax in axes:
        for c in ax.containers:
            ax.bar_label(c, fmt='%.0f', padding=2)
    

    enter image description here

    Without df['age_cat'].cat.remove_unused_categories()

    enter image description here