pythonmatplotlibcurve-fittinggaussian

Fitting a Gaussian to a probability distribution to find the standard deviation, in python using matplotlib


Plot of one distribution

I'm quite new to coding so please be patient. I'm trying to model different probability distributions, and I would like to fit a Gaussian to each one and then find the standard deviation of the produced Gaussians, and then compare those std devs.

I'm unsure how to tell python to find the closest Gaussian to my curve, so any help is appreciated.

Here is my code so far, I tried to use curve_fit, but I really don't know how to, so I took it out, and I'm unsure how to define the Gaussian for the best fit.

import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg, special, optimize, stats
from numpy import math
from scipy.optimize import curve_fit

q = 4000
qA = np.arange(0.000001,q+1,1.0)
qB = q - qA
N = 2000
NA = 1000
NB = N - NA

ASA = (qA + NA -1)*np.log(qA + NA -1)-(qA*np.log(qA)) - (NA -1)*np.log(NA -1) #sterling approximate of SA/kb

ASB = (qB + NB -1)*np.log(qB + NB -1)-(qB*np.log(qB)) - (NB -1)*np.log(NB -1) #sterling approximate of SB/kb



TATA_list=[]
for i in range(1,len(qA)-1):
    
TATE = (qA[i+1] - qA[i-1]) / (ASA[i+1] - ASA[i-1])
TATA_list.append(TATE)                      

TBTA_list=[]
for i in range(1,len(qA)-1):
    
TBTE = (qB[i+1] - qB[i-1]) / (ASB[i+1] - ASB[i-1])
TBTA_list.append(TBTE)                      

AStot = (q + N -1)*np.log(q + N -1)-(q*np.log(q))-(N-1)*np.log(N-1)
Paprox = np.exp(ASA + ASB - AStot)
SUMPX = sum(Paprox[:-1])
NormPAprox = Paprox/SUMPX

plt.plot(qA/q, NormPAprox)




Solution

  • I start by defining the curve to fit, providing an initial guess of the parameters. Then I run the fit and print out the results.

    enter image description here

    #Define the curve
    def gaussian(x, mean, sigma, amplitude):
        return amplitude * np.exp(-(x - mean)**2 / (2 * sigma**2))
    
    #Drop NaNs
    is_nan = np.isnan(NormPAprox)
    is_valid = ~is_nan
    print('Dropping', is_nan.sum(), 'NaNs')
    x_vals = (qA / q)[is_valid]
    y_vals = NormPAprox[is_valid]
    
    #Initial guess for each parameter
    # mean: x location of the peak
    # std: std of x
    # amplitude: amplitude of the peak
    initial_guess = [x_vals[np.argmax(y_vals)], x_vals.std(), y_vals.max()]
    
    #Get the fitted params
    fit_params, fit_covariance = curve_fit(gaussian, x_vals, y_vals, p0=initial_guess)
    print('Fitted mean, std, amplitude are:', fit_params.round(3))
    print(' Variance of each estimate:', fit_covariance.diagonal().round(3))
    

    Visualise the data and the fit (zoomed onto the peak):

    #Plot original and estimated
    plt.plot(x_vals, y_vals, linewidth=8, color='darkgreen', label='data')
    plt.plot(x_vals, gaussian(x_vals, *fit_params), color='tab:orange', label='fitted')
    plt.xlim(0.45, 0.55)
    plt.title('Data and fit')
    plt.legend()
    plt.xlabel('qA/q')
    plt.ylabel('NormPAprox')
    plt.gcf().set_size_inches(5, 3)