matplotlibseabornviolin-plot

Separate halves of split violinplot to compare tail data


Is there a way to physically separate the two halves of a "split" seaborn violinplot (or other type of violinplot)? I'm trying to compare two different treatments, but there is a skinny tail, and it's difficult (impossible) to tell whether one or both halves of the split violin go up all the way to the tip of the tail.

example violinplot

One thought I had was that if the two halves were slightly separated instead of right up next to each other, then it would be easy to absorb the data accurately.

Here is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
import seaborn as sns

# load data into a dataframe
df1 = pd.read_excel('Modeling analysis charts.xlsx',
                   sheetname='lmps',
                   parse_cols=[0,5],
                   skiprows=0,
                   header=1)

# identify which dispatch run this data is from      
df1['Run']='Scheduling' 

# load data into a dataframe
df2 = pd.read_excel('Modeling analysis charts.xlsx',
                   sheetname='lmps',
                   parse_cols=[7,12],
                   skiprows=0,
                   header=1)

# identify which dispatch run this data is from
df2['Run']='Pricing' 

# drop rows with missing data
df1 = df1.dropna(how='any')
df2 = df2.dropna(how='any')

# merge data from different runs
df = pd.concat([df1,df2])

# LMPs are all opposite of actual values, so correct that
df['LMP'] = -df['LMP']

fontsize = 10

style.use('fivethirtyeight')

fig, axes = plt.subplots()

sns.violinplot(x='Scenario', y='LMP', hue='Run', split=True, data=df, inner=None, scale='area', bw=0.2, cut=0, linewidth=0.5, ax = axes)
axes.set_title('Day Ahead Market')

#axes.set_ylim([-15,90])
axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

#plt.savefig('DAMarket.pdf', bbox_inches='tight')

plt.show()

Solution

  • EDIT #2: New versions of seaborn (>=0.13.0) now support this feature natively.

    Use the gap keyword argument, e.g.

    sns.violinplot(..., gap=0.1)
    

    All hail mwaskom & Co.

    EDIT: For historical reasons this is the accepted answer, but have a look at @conchoecia more recent and much cleaner implementation.

    Cool idea. The basic idea of my implementation is to draw the whole thing, grab the patches corresponding to the two half-violins, and then shift paths of those patches left or right. Code is hopefully self-explanatory, otherwise let me know in the comments.

    enter image description here

    import numpy as np
    import matplotlib.pyplot as plt;
    import matplotlib.collections
    import seaborn as sns
    import pandas as pd
    
    # create some data
    n = 10000 # number of samples
    c = 5 # classes
    y = np.random.randn(n)
    x = np.random.randint(0, c, size=n)
    z = np.random.rand(n) > 0.5 # sub-class
    data = pd.DataFrame(dict(x=x, y=y, z=z))
    
    # initialise new axis;
    # if there is random other crap on the axis (e.g. a previous plot),
    # the hacky code below won't work
    fig, ax = plt.subplots(1,1)
    
    # plot
    inner = None # Note: 'box' is default
    ax = sns.violinplot(data=data, x='x', y='y', hue='z', split=True, inner=inner, ax=ax)
    
    # offset stuff
    delta = 0.02
    for ii, item in enumerate(ax.collections):
        # axis contains PolyCollections and PathCollections
        if isinstance(item, matplotlib.collections.PolyCollection):
            # get path
            path, = item.get_paths()
            vertices = path.vertices
    
            # shift x-coordinates of path
            if not inner:
                if ii % 2: # -> to right
                    vertices[:,0] += delta
                else: # -> to left
                    vertices[:,0] -= delta
            else: # inner='box' adds another type of PollyCollection
                if ii % 3 == 0:
                    vertices[:,0] -= delta
                elif ii % 3 == 1:
                    vertices[:,0] += delta
                else: # ii % 3 = 2
                    pass