pythonmatplotlibseabornerrorbarstdev

Standard deviation error bars from seaborn seem too small


I have originally used numpy function .std on my dataframe to obtain standard deviation and plot it using matplotlib. Later, I have tried making the same graph using seaborn. The two graphs looked close enough until I overlayed them and found that all error bars from seaborn are smaller - the difference being more pronounced the bigger they are. I checked in different software that the results from .std are correct and that they are also correctly plotted. What could be the source of problems (I can't seem to be able to pull out the graph source data from seaborn)?

I used this code: ax_sns = sns.barplot(x = 'name', y = column_to_plot, data=data, hue='method', capsize=0.1, ci='sd', errwidth=0.9)

the graph - seaborn errorbars are smaller - the darker ones


Solution

  • You didn't provide the code where you calculated the standard deviation. Perhaps you used pandas .std(). Seaborn uses numpy's. Numpy's std uses the "Bessel's correction". The difference is most visible when the number of data points is small (when / n vs / (n-1) is larger).

    The following code visualizes the difference between error bars calculated via seaborn, numpy and pandas.

    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    
    flights = sns.load_dataset('flights')
    fig, ax = plt.subplots(figsize=(12, 5))
    sns.barplot(x='month', y='passengers', data=flights, capsize=0.1, ci='sd', errwidth=0.9, fc='yellow', ec='blue', ax=ax)
    
    flights['month'] = flights['month'].cat.codes  # change to a numeric format
    for month, data in flights.groupby('month'):
        mean = data['passengers'].mean()
        pandas_std = data['passengers'].std()
        numpy_std = np.std(data['passengers'])
        ax.errorbar(month - 0.2, mean, yerr=numpy_std, ecolor='crimson', capsize=8,
                    label='numpy std()' if month == 0 else None)
        ax.errorbar(month + 0.2, mean, yerr=pandas_std, ecolor='darkgreen', capsize=8,
                    label='pandas std()' if month == 0 else None)
    ax.margins(x=0.015)
    ax.legend()
    plt.tight_layout()
    plt.show()
    

    sns.barplot with numpy vs pandas errorbars

    PS: Some related posts with additional information: