pythonseaborngrouped-bar-chart

Annotate seaborn barplot with value from another column


I have created a grouped barplot which expresses the percentage of cases won per time interval. I would like to annotate the barplot with the number of cases won per time interval.

Here is my code:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


df = pd.DataFrame({
    'years': ['1994-1998','1999-2003','2004-2008','2009-2013','2013-2017','2018-2022'],
    'Starfish number of cases': [10,8,31,12,2,3],
    'Starfish percent of wins': [0,0.25,0.225806451612903,0.416666666666666,1,0],
    'Jellyfish number of cases':[597,429,183,238,510,595],
    'Jellyfish percent of wins':[0.362646566164154,0.273892773892773,0.423497267759562,0.478991596638655,0.405882352941176,0.408403361344537],

})

df = pd.melt(df, id_vars=['years'], value_vars=['Starfish percent of wins', 'Jellyfish percent of wins'])

sns.set_theme(style="whitegrid")


# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(30, 15))

sns.barplot(x="years", y="value", hue='variable', data=df)


for p in ax.patches:
    ax.annotate(str(p.get_height()), (p.get_x() * 1.005, p.get_height() * 1.005))

I have tried to include the number of cases in the melt function (i.e. df = pd.melt(df, id_vars=['years'], value_vars=['Starfish number of cases','Jellyfish number of cases','Starfish percent of wins', 'Jellyfish percent of wins'])) but this adds additional bars representing the total number of cases.

I tried to modify the answer here by adding the lines below, but the results show percentage annotations, not number of cases:

for p,years in zip(ax.patches, df['Starfish number of cases','Jellyfish number of cases']):
    ax.annotate(years, xy=(p.get_x()+p.get_width()/2, p.get_height()),
                ha='center', va='bottom')

There's an answer here, but it's complicated. There must be a simpler way?


Solution

  • The approach below adds the 'number of cases' columns to be included in the melt. Then, the bar plot is created with only the percentages.

    The bars are stored in ax.containers. There are 2 containers, one for each hue value. ax.bar_label() can get a container and a list of labels as input.

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    
    df_orig = pd.DataFrame({
        'years': ['1994-1998', '1999-2003', '2004-2008', '2009-2013', '2013-2017', '2018-2022'],
        'Starfish number of cases': [10, 8, 31, 12, 2, 3],
        'Starfish percent of wins': [0, 0.25, 0.2258064516, 0.41666666666, 1, 0],
        'Jellyfish number of cases': [597, 429, 183, 238, 510, 595],
        'Jellyfish percent of wins': [0.3626465661, 0.2738927739, 0.4234972677, 0.4789915966, 0.4058823529, 0.4084033613],
    })
    
    df = pd.melt(df_orig, id_vars=['years'],
                 value_vars=['Starfish number of cases', 'Starfish percent of wins',
                             'Jellyfish number of cases', 'Jellyfish percent of wins'])
    
    sns.set_theme(style="whitegrid")
    
    # Initialize the matplotlib figure
    fig, ax = plt.subplots(figsize=(12, 5))
    
    sns.barplot(x="years", y="value", hue='variable',
                hue_order=['Starfish percent of wins', 'Jellyfish percent of wins'], data=df, ax=ax)
    
    for bargroup, variable in zip(ax.containers, ['Starfish number of cases', 'Jellyfish number of cases']):
        labels = ['' if val == 0.0 else f'{val:.0f}' for val in df[df['variable'] == variable]['value']]
        ax.bar_label(bargroup, labels)
    sns.despine()
    

    seaborn barplot with labels from other column