I'm probably making a stupid mistake, but here's how to reproduce:
import seaborn as sns
from scipy.stats import lognorm
import numpy as np
mu = 25
samples = lognorm.rvs(s=1, loc=0, scale=np.log(mu), size=10000)
shape, loc, scale = lognorm.fit(samples)
print(shape, loc, scale)
fig, ax = plt.subplots()
sns.histplot(samples, bins=50, stat="density", log_scale=True, ax=ax)
xs = np.linspace(0.1, 100, 10000)
ys = lognorm.pdf(xs, s=shape, loc=loc, scale=scale)
ax.plot(xs, ys, "r-")
I don't think the problem is with scipy.stats
. Plotting with matplotlib
, I see good agreement:
from scipy.stats import lognorm
import numpy as np
import matplotlib.pyplot as plt
mu = 25
sample = lognorm.rvs(s=1, loc=0, scale=np.log(mu), size=100000)
shape, loc, scale = lognorm.fit(sample)
fig, ax = plt.subplots()
ax.set_xscale('log')
bins = np.logspace(-2, 2)
ax.hist(sample, bins=bins, density=True)
xs = np.logspace(-2, 2, 1000)
ys = lognorm.pdf(xs, s=shape, loc=loc, scale=scale)
ax.plot(xs, ys, "r-")
You can also plot the histogram of the log of the sample against the corresponding normal distribution with Seaborn.
import seaborn as sns
from scipy.stats import lognorm, norm
import numpy as np
mu = 25
sample = lognorm.rvs(s=1, loc=0, scale=np.log(mu), size=100000)
shape, loc, scale = lognorm.fit(sample)
fig, ax = plt.subplots()
sns.histplot(np.log(sample), bins=50, stat="density", ax=ax)
xs = np.linspace(-2, 5, 1000)
ys = norm.pdf(xs, loc=np.log(np.log(mu)), scale=1)
ax.plot(xs, ys, "r-")
I suspect there's something about the interaction between 'density'
and log_scale
that is not correct, possibly in our understanding of seaborn.
Update: see https://github.com/mwaskom/seaborn/issues/3579 for an explanation of what is going on. Apparently the density normalization is performed on the log-transformed data, even though the data is displayed with the original magnitudes on a log-scaled axis.
If the shape of the seaborn histogram were preserved but the log-scaled axis were replaced by a linear-scaled axis with log-transformed labels, then the area under the curve would be 1.0.
In other words, it can be thought of as a histogram of the log-transformed data but displayed with the original data magnitudes on a log-scaled axis.
This doesn't account for the problem, but noticed that you defined mu=25
and passed scale=np.log(mu)
to lognorm
. Double check that this is what you mean to do against the documentation of lognorm
.
Suppose a normally distributed random variable
X
has meanmu
and standard deviationsigma
. ThenY = exp(X)
is lognormally distributed withs = sigma
andscale = exp(mu)
.