pythonmatplotlibseabornhistogrambins

Ensuring Consistent Bin Intervals for Multiple Seaborn Histograms with the Freedman-Diaconis Rule


I am trying to plot 2 histograms side by side, the first one for the full dataset, and second one for a subset of the dataset. For comparability, I want both to have the same class intervals and the bin widths must be calculated as per the Freedman-Diaconis rule, (probably the default mode used by sns.histplot as per a stackoverflow answer).

I want the first histogram's bins to be the defaults decided by the sns.histplot() function.
Then I want to extract the list of bin intervals or break points used by the first plot, and use that as an argument while generating the second histogram.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.datasets import load_boston
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)

# Histograms
f, axs = plt.subplots(1, 2, figsize=(15, 4.5))
a = sns.histplot(df['NOX'], ax=axs[0], color='steelblue')
b = sns.histplot(df.NOX[df.CRIM > 10.73], ax=axs[1], color='darkgreen')
plt.show()

Questions:
1) How to extract the list of bins used by a sns.histplot()
2) How to plot 2 histograms with same bins, using the Freedman-Diaconis rule?


Solution

  • Seaborn usually doesn't give access to its calculations, it just tries to create visualizations. But you can use the same underlying functions to get its results. You need bins = np.histogram_bin_edges(..., bins='auto') (or bins='fd' to force the Freedman Diaconis Estimator). And then sns.histplot(..., bins=bins) for both.

    import pandas as pd
    import numpy as np
    import seaborn as sns
    import matplotlib.pyplot as plt
    from sklearn.datasets import load_boston
    
    boston = load_boston()
    df = pd.DataFrame(boston.data, columns=boston.feature_names)
    
    bins = np.histogram_bin_edges(df['NOX'], bins='auto')
    f, axs = plt.subplots(1, 2, figsize=(15, 4.5))
    sns.histplot(df['NOX'], bins=bins, color='steelblue', ax=axs[0])
    sns.histplot(df[df['CRIM'] > 10.73]['NOX'], bins=bins, color='darkgreen', ax=axs[1])
    for ax in axs:
        for p in ax.patches:
            x, w, h = p.get_x(), p.get_width(), p.get_height()
            if h > 0:
                ax.text(x + w / 2, h, f'{h / len(df) * 100:.2f}%\n', ha='center', va='center', size=8)
        ax.margins(y=0.07)
    plt.show()
    

    example plot