pythonseabornboxplotgroup

How do I represent sided boxplot in seaborn when boxplots are already grouped?


I'm seeking for a way to represent two sided box plot in seaborn. I have 2 indexes (index1 and index2) that I want to represent according to two information info1 (a number) and info2 (a letter) My issue is the boxplot I have are already grouped together, and I don't understand how manage the last dimension?

for now I can just represent both indexes separately in two panels (top and middle)

what I would like is the box plot of the two indexes being represented just aside

Something like this for instance: enter image description here

I don't know if it is easily doable

Here a short example:

import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = plt.subplot(3, 1, 1)
ax2 = plt.subplot(3, 1, 2)
ax3 = plt.subplot(3, 1, 3)

index1 = np.random.random((4,100,4))
intex2 = np.random.random((4,100,4))/2.

info1 = np.zeros(shape=index1.shape,dtype='object')
info1[0,:,:] = 'One'
info1[1,:,:] = 'Two'
info1[2,:,:] = 'Three'
info1[3,:,:] = 'Four'

info2 = np.zeros(shape=index1.shape, dtype='object')
info2[:, :, 0] = 'A'
info2[:, :, 1] = 'B'
info2[:, :, 2] = 'C'
info2[:, :, 3] = 'D'

df = pd.DataFrame(
    columns=['Info1', 'Info2', 'Index1', 'Index2'],
    data=np.array(
        (info1.flatten(), info2.flatten(), index1.flatten(), intex2.flatten())).T)


sns.boxplot(x='Info1', y='Index1', hue="Info2", data=df, ax=ax1)
ax1.set_title('Index1')
ax1.set_ylim([0, 1])

sns.boxplot(x='Info1', y='Index2', hue="Info2", data=df, ax=ax2)
ax2.set_ylim([0, 1])
ax2.set_title('Index2')

# sns.boxplot(x='Info1', y='Index1', hue="Info2", data=df, ax=ax3)
ax3.set_ylim([0, 1])
ax3.set_title('Index1 + Index2')

plt.show()

enter image description here


Solution

  • To create an additional grouping in Seaborn, the idea is to let Seaborn create a grid of subplots (called FacetGrid in Seaborn). The function sns.catplot(kind='box', ...) creates such a FacetGrid for boxplots. The col= parameter takes care of putting each Info1 in a separate subplot.

    To use Index1/Index2 as hue, both columns need to be merged (e.g. via pd.melt(...)).

    In total, the catplot allows 4 groupings: on x, hue, col and row.

    Here is how the code and plot could look like. Unfortunately, you can't force such an catplot into a previously created figure.

    import numpy as np
    import seaborn as sns
    import pandas as pd
    import matplotlib.pyplot as plt
    
    index1 = np.random.random((4, 100, 4))
    intex2 = np.random.random((4, 100, 4)) / 2.
    
    info1 = np.zeros(shape=index1.shape, dtype='object')
    info1[0, :, :] = 'One'
    info1[1, :, :] = 'Two'
    info1[2, :, :] = 'Three'
    info1[3, :, :] = 'Four'
    
    info2 = np.zeros(shape=index1.shape, dtype='object')
    info2[:, :, 0] = 'A'
    info2[:, :, 1] = 'B'
    info2[:, :, 2] = 'C'
    info2[:, :, 3] = 'D'
    
    df = pd.DataFrame(
        columns=['Info1', 'Info2', 'Index1', 'Index2'],
        data=np.array(
            (info1.flatten(), info2.flatten(), index1.flatten(), intex2.flatten())).T)
    
    df_long = df.melt(id_vars=['Info1', 'Info2'], value_vars=['Index1', 'Index2'], var_name='Index')
    
    sns.catplot(data=df_long, kind='box', col='Info1', x='Info2', y='value', hue='Index', height=3, aspect=1)
    plt.show()
    

    three groupings: catplot of boxplots

    To have more similar colors, the palette= parameter can set the colors of your choice. E.g. palette='tab20'.

    sns.catplot(data=df_long, kind='box', col='Info1', x='Info2', y='value', height=3, aspect=1,
                hue='Index', palette=['steelblue', 'lightblue'])
    

    dark blue / light blue palette

    To make things more colorful, you can loop through the boxes and color them individually. hue_order= makes sure Index1 will be at the left, and allows the legend to be omitted. The 'tab20' colormap (used as palette) contains alternating dark and light colors.

    g = sns.catplot(data=df_long, kind='box', col='Info1', x='Info2', y='value', height=3, aspect=1,
                    hue='Index', hue_order=['Index1', 'Index2'], legend=False)
    for ax in g.axes.flat:
        num_hues = len(ax.containers)
        boxes_per_hue = len(ax.containers[0].boxes)
        colors = sns.color_palette('tab20', n_colors=num_hues * boxes_per_hue)
        for hue_id, boxes in enumerate(ax.containers):
            for box, color in zip(boxes.boxes, colors[hue_id::num_hues]):
                box.set_color(color)
    

    boxplots with individual colors