python-3.xdistributiondata-fittinggamma

Fitting Gamma distribution in Python


I'm finding the parameters of Gamma distribution for a small sample. Later, I need to use the parameters to predict future data. However, the result shown incorrect answer.

This is the result I got from Excel and it was the correct answer I'm looking for Alpha 0.458718895 Beta 96.76626573

import scipy.stats as stats 
data=[0.0621,0.046,0.0324,0.0279]
fit_alpha, fit_loc, fit_beta=stats.gamma.fit(data,floc=0)
print(fit_alpha, fit_loc, fit_beta)
ll=[1,2,3,4,5,6,7,8,9,10]
plop=stats.gamma.pdf(ll,fit_alpha, fit_loc, fit_beta)
print(plop)

Expected result: 6.29% 4.28% 3.40% 2.88% 2.53% 2.27% 2.06% 1.90% 1.76% 1.65%


Solution

  • You are using the fit the wrong way. You try to fit the PDF while scipy.stat is fitting the best underlying distribution to random data. Have a look here:

    import matplotlib.pyplot as plt
    import numpy as np
    import scipy.stats as stats 
    from scipy.optimize import leastsq
    
    def my_res( params, yData ):
        a, b = params
        xList= range( 1, len(yData) + 1 )
        th = np.fromiter( ( stats.gamma.pdf( x, a, loc=0, scale=b ) for x in  xList ), np.float )
        diff = th - np.array( yData )
        return diff
    
    data = [ 0.0621, 0.046, 0.0324, 0.0279 ]
    
    ### this does not work as data is supposed to be the random variate data and not the pdf
    fit_alpha, fit_loc, fit_beta = stats.gamma.fit(data, floc=0 )
    print 'data fitted the wrong way:'
    print(fit_alpha, fit_loc, fit_beta)
    
    
    #### but making a least square fit with the pdf works
    sol, err = leastsq( my_res, [.4, 1 ], args=( data, ) )
    print '...and the right way:'
    print sol
    datath = [ stats.gamma.pdf( x, sol[0], loc=0, scale=sol[1]) for x in range(1,5) ]
    
    ### the result gives the expected answer
    ll=[1,2,3,4,5,6,7,8,9,10]
    plop=stats.gamma.pdf(ll, sol[0], loc=0, scale=sol[1])
    print 'expected values:'
    print(plop)
    
    
    ### if we generate random numbers with gamma distribution
    ### the fit does what is should
    
    testData = stats.gamma.rvs(sol[0], loc=0, scale=sol[1], size=5000 )
    print 'using stats.gamma.fit the correct way:'
    print stats.gamma.fit( testData, floc=0 )
    
    fig = plt.figure()
    ax = fig.add_subplot( 1, 1, 1 )
    ax.plot( data , ls='', marker='x')
    ax.plot( datath , ls='', marker='^')
    
    plt.show()
    

    providing

    >> data fitted the wrong way:
    >> (10.36700043818477, 0, 0.00406096249836482)
    >> ...and the right way:
    >> [ 0.45826569 96.8498341 ]
    >> expected values:
    >> [0.06298405 0.04282212 0.0340243  0.02881519 0.02527189 0.02265992 0.02063036 0.01899356 0.01763645 0.01648688]
    >> using stats.gamma.fit the correct way:
    >> (0.454884062189886, 0, 94.94258888249479)
    

    enter image description here