pythonpandasnumpyseabornhistplot

How to create a histogram with points rather than bars


I would like to plot a histplot but using points rather than bars.

x_n10_p0_6 = binom.rvs(n=10, p=0.6, size=10000, random_state=0)
x_n10_p0_8 = binom.rvs(n=10, p=0.8, size=10000, random_state=0)
x_n20_p0_8 = binom.rvs(n=20, p=0.6, size=10000, random_state=0)

df = pd.DataFrame({
    'x_n10_p0_6': x_n10_p0_6, 
    'x_n10_p0_8': x_n10_p0_8, 
    'x_n20_p0_8': x_n20_p0_8
    })

sns.histplot(df)

This is what I'm getting:

enter image description here

I would like to see something like this:

enter image description here

Source: https://en.wikipedia.org/wiki/Binomial_distribution#/media/File:Binomial_distribution_pmf.svg

There is an element attribute to histplot but it only takes the values {“bars”, “step”, “poly”}


Solution

  • You are working with discrete distributions. A kde plot, on the contrary, tries to approximate a continuous distribution by smoothing out the input values. As such, a kdeplot with your discrete values only gives a crude approximation of the plot you seem to be after.

    Seaborn's histplot currently only implements bars for discrete distributions. However, you can mimic such a plot via matplotlib. Here is an example:

    import matplotlib.pyplot as plt
    from matplotlib.ticker import MaxNLocator
    from scipy.stats import binom
    import pandas as pd
    import numpy as np
    
    x_n10_p0_6 = binom.rvs(n=10, p=0.6, size=10000, random_state=0)
    x_n10_p0_8 = binom.rvs(n=10, p=0.8, size=10000, random_state=0)
    x_n20_p0_8 = binom.rvs(n=20, p=0.6, size=10000, random_state=0)
    
    df = pd.DataFrame({
      'x_n10_p0_6': x_n10_p0_6,
      'x_n10_p0_8': x_n10_p0_8,
      'x_n20_p0_8': x_n20_p0_8
    })
    for col in df.columns:
      xmin = df[col].min()
      xmax = df[col].max()
      counts, _ = np.histogram(df[col], bins=np.arange(xmin - 0.5, xmax + 1, 1))
      plt.scatter(range(xmin, xmax + 1), counts, label=col)
    plt.legend()
    plt.gca().xaxis.set_major_locator(MaxNLocator(integer=True))  # force integer ticks for discrete x-axis
    plt.ylim(ymin=0)
    plt.show()
    

    using dots to show discrete histogram

    Note that seaborn's histplot has many more options than shown in this example (e.g. scaling the counts down to densities).