I am trying to fit Gamma CDF using scipy.stats.gamma
but I do not know what exactly is the a
parameter and how the location and scale parameters are calculated. Different literatures give different ways to calculate them and its very frustrating. I am using below code which is not giving correct CDF. Thanks in advance.
from scipy.stats import gamma
loc = (np.mean(jan))**2/np.var(jan)
scale = np.var(jan)/np.mean(jan)
Jancdf = gamma.cdf(jan,a,loc = loc, scale = scale)
a
is the shape. What you have tried works only in the case where loc = 0
. First we start with two examples, with shape (or a
) = 10 and scale = 5, and the second d1plus50 differs from the first by 50, and you can see the shift which is dictated by loc:
from scipy.stats import gamma
import matplotlib.pyplot as plt
d1 = gamma.rvs(a = 10, scale=5,size=1000,random_state=99)
plt.hist(d1,bins=50,label='loc=0,shape=10,scale=5',density=True)
d1plus50 = gamma.rvs(a = 10, loc= 50,scale=5,size=1000,random_state=99)
plt.hist(d1plus50,bins=50,label='loc=50,shape=10,scale=5',density=True)
plt.legend(loc='upper right')
So you have 3 parameters to estimate from the data, one way is use gamma.fit, we apply this on the simulated distribution with loc=0 :
xlin = np.linspace(0,160,50)
fit_shape, fit_loc, fit_scale=gamma.fit(d1)
print([fit_shape, fit_loc, fit_scale])
[11.135335235456457, -1.9431969603988053, 4.693776771991816]
plt.hist(d1,bins=50,label='loc=0,shape=10,scale=5',density=True)
plt.plot(xlin,gamma.pdf(xlin,a=fit_shape,loc = fit_loc, scale = fit_scale)
And if we do it for the distribution we simulated with loc, and you can see the loc is estimated correctly, as well as shape and scale:
fit_shape, fit_loc, fit_scale=gamma.fit(d1plus50)
print([fit_shape, fit_loc, fit_scale])
[11.135287555530564, 48.05688649976989, 4.693789434095116]
plt.hist(d1plus50,bins=50,label='loc=0,shape=10,scale=5',density=True)
plt.plot(xlin,gamma.pdf(xlin,a=fit_shape,loc = fit_loc, scale = fit_scale))