I am working with scipy's curve_fit function, and using pcov to calculate the parameter errors. I understand pcov produce the variance in the parameters which it produces. Variance is obviously calculated from a set of numbers, which are the estimates for the respective parameters. I am wondering if it is possible to produce this set of estimates so that I can see it and try to calculate the variance by hand. Additionally, I would like to plot a histogram from these values. Does anyone know how I could do this, or if there is a separate function which might be able to help?
Here is my current code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2, 4, 9, 16, 25])
def function(x, a, b):
return a * np.exp(-b*x)
popt, pcov = curve_fit(function, x_data, y_data)
a, b = popt
print("pcov =", pcov)
print("a +/- standard deviation =", popt[0], "+/-", pcov[0,0]**0.5)
print("b +/- standard deviation =", popt[1], "+/-", pcov[1,1]**0.5)
print("a +/- standard error =", popt[0], "+/-", ((pcov[0,0]**0.5)/np.sqrt(5)))
print("b +/- standard error =", popt[1], "+/-", ((pcov[1,1]**0.5)/np.sqrt(5)))
print("Confidence Interval:", CI)
plt.plot(x_data, function(x_data, *popt), "k--", label='Curve Fit')
plt.plot(x_data, y_data, 'o', label='Raw Data')
plt.xlabel("Time")
plt.ylabel("AB")
plt.legend()
which produces:
To reiterate, I want to produce the set of values used to calculate the variance of each parameter. Thank you!
If I correctly understand your OP, there are two questions:
Lets build a representative example:
import numpy as np
import numdifftools as nd
import matplotlib.pyplot as plt
from scipy import optimize, stats
We create some data for your model with normal noise of 0.1
sigma:
np.random.seed(123456)
x = np.linspace(-1, 1, 3000)
p = (2.5, 1.2)
y = model(x, *p)
s = 0.1 * np.ones_like(x)
n = s * np.random.normal(size=x.size)
yn = y + n
Method curve_fit
can handle this fit without problem:
popt, pcov = optimize.curve_fit(model, x, yn, sigma=s, absolute_sigma=True)
#(array([2.50042509, 1.19873061]),
# array([[ 5.22097958e-06, -2.50399907e-06],
# [-2.50399907e-06, 1.66936758e-06]]))
From this adjustment, we can estimate the curve and the residuals:
yhat = model(x, *popt)
residuals = yn - yhat
Adjustment looks like:
fig, axe = plt.subplots()
axe.plot(x, yn)
axe.plot(x, yhat)
axe.grid()
We can fit residuals against a normal distribution to check our result:
p = stats.norm.fit(residuals)
# (-3.506208994573257e-05, 0.10181543029512476)
We have residuals centered at almost 0.
with a sigma about 0.1
which was expected. Graphically it leads to:
law = stats.norm(*p)
rlin = np.linspace(residuals.min(), residuals.max(), 200)
fig, axe = plt.subplots()
axe.hist(residuals, 15, density=1.)
axe.plot(rlin, law.pdf(rlin))
axe.grid()
Now we will focus on how covariance matrix is estimated and what is the link with the residuals.
Covariance matrix is estimated as the inverse of the Hessian of the loss function at the optimized solution.
The curve_fit
method use Chi Square Loss function and minimize it wrt to parameters knowing the dataset. We can rewrite such a function as follow:
def loss_factory(x, y, s):
def wrapped(p):
return 0.5 * np.sum(np.power((y - model(x, *p)) / s, 2))
return wrapped
loss = loss_factory(x, yn, s)
The goal is then:
p
, knowing x
, y
and s
;x0
which minimize the loss function;We minimize the loss:
sol = optimize.minimize(loss, x0=[1., 1.], tol=1e-4)
# message: Optimization terminated successfully.
# success: True
# status: 0
# fun: 1554.9574613298098
# x: [ 2.500e+00 1.199e+00]
# nit: 11
# jac: [-9.155e-05 -4.578e-05]
# hess_inv: [[ 5.208e-06 -2.497e-06]
# [-2.497e-06 1.666e-06]]
# nfev: 60
# njev: 20
Solution object already offers an estimate of the inverse of the Hessian which complies with Covariance matrix provided by curve_fit
.
In case of need we can refine it we can do so using the numdifftools
package:
H = nd.Hessian(loss)(sol.x)
C = np.linalg.inv(H)
# array([[ 5.22047492e-06, -2.50366278e-06],
# [-2.50366278e-06, 1.66914356e-06]])
Which also agrees with covariance matrix provided by curve_fit
.