pythonseaborn

Multiple overlapping seaborn violin plots, split by hue


I am trying to create overlapping and transparent violin plots split by one variable using seaborn in python. My dataset looks like this:

enter image description here

The variable "names" are "one" to "nine", "distance" is from 0 to 1, condition is either "healthy" or "disease", and "sample_id" is 1 to 16. Each "condition" has 8 sample_ids.

Please see my current result below: enter image description here

As you can see, the problem is that the two halves of the violin plot are wrong orientation for each of the "name" variables, and the legend contains disease/healthy "condition" variable for each of the 16 sample_ids.

The code that I am using for this is:

my_ids=my_dataset.sample_id.unique()
my_condition_palette={"disease": "darkorange","healthy":"steelblue"}
fig, ax = plt.pyplot.subplots()
for sample_id in my_ids:
sns.violinplot(data=my_dataset[my_dataset.sample_id==sample_id], x="name", y="distance", hue="condition", hue_order=["disease", "healthy"], palette=my_condition_palette, cut=0, linewidth=0, inner=None, split=True,density_norm="count",common_norm=False, gap=0.1)
for violin in ax.collections:
violin.set_alpha(1/8)

Does anyone know what I am doing wrong here? Or perhaps there is a better way of plotting this? Thank you!


Solution

  • With density_norm="count", the width of the violin for the x-value with the highest count (for the given sample_id) is maximized. The width of the other violins is shrunk relative to their count.

    In the given dataset, it seems that each sample_id is either fully 'healthy' or fully 'disease'. When drawing one sample_id, seaborn thinks there is only one hue value active, which will occupy the full width for each of the x-values. You can use dodge=True to force the violin to be reduced and put on the correct side.

    For the legend, you can set legend=False for all except one of the sample_ids.

    The following code creates reproducible test data and shows how everything could work. order= sets the order of the x values.

    from matplotlib import pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    
    # first, create some dummy test data
    np.random.seed(20250120)
    df = pd.DataFrame({'sample_id': np.repeat(np.arange(1, 17), 100)})
    names = ['one', 'two', 'three', 'four', 'five', 'six']
    prob = np.random.rand(len(names)) ** 2 + 0.1  # use different probabilities for each 'name'
    prob /= prob.sum()  # the probabilities need to sum to 1
    df['name'] = np.random.choice(names, len(df), p=prob)
    df['distance'] = np.random.rand(len(df))
    df['condition'] = np.where(df['sample_id'] % 2 == 1, 'disease', 'healthy')
    
    my_ids = df.sample_id.unique()
    my_condition_palette = {"disease": "darkorange", "healthy": "steelblue"}
    fig, ax = plt.subplots()
    for sample_id in my_ids:
        sns.violinplot(data=df[df['sample_id'] == sample_id], x="name", y="distance", order=names,
                       hue="condition", hue_order=["disease", "healthy"], palette=my_condition_palette,
                       cut=0, linewidth=0, inner=None, split=True, density_norm="count", common_norm=False, gap=0.1,
                       dodge=True,
                       legend=sample_id == my_ids[0])
    for violin in ax.collections:
        violin.set_alpha(1 / 8)
    sns.despine()
    sns.move_legend(ax, loc="upper left", bbox_to_anchor=(1, 1))
    ax.set_xlabel('')  # remove superfluous x label
    plt.tight_layout()
    plt.show()
    

    seaborn violinplots superimposed

    PS: This is how the plot looks without dodge=True, and plotting only the first sample. The "half" violins are rescaled to occupy the full width (default 0.8 wide) for each x value.

    violinplot, only first sample