python scipy data-analysis curve-fitting

good r2 score but huge parameter uncertainty

I'm using a quadratic function to fit my data. I have good R2 score but huge uncertainty in my fitting parameters

Here is the graph and the results:

Graph of fit
R2 score: 0.9698143924536671
uncertainty in a, b, and y0: 116.93787913, 10647.11867972, 116.93787935

How should I intepret this result?

Here is how I defined the quadratic function:

def my_quad(x, a, b, y0):
    return a*(1-x**2/(2*b**2))+ y0

Here's how I calculated the uncertainty for the parameters and R2 score:

popt, pcov = curve_fit(my_quad, x_data,y_data, bounds=([0, 0, -np.inf], [np.inf, np.inf, np.inf]))
a, b, y= popt
err = np.sqrt(np.diag(pcov))
y_pred = my_quad(x_data, *popt)
r2 = r2_score(y_data, y_pred))

Solution

Your model is over-parametrized. You can tell when you expand the polynomial:

a * (1 - x**2     / (2*b**2)) + y0 -> 
     a - x**2 * a / (2*b**2)  + y0 ->
  y0+a - x**2 * a / (2*b**2)

The are only two independent parameters, y0 + a and a / (2*b**2).

You will be able to fit just as well with any two of your original parameters, and then the uncertainty will be reduced significantly.

For example:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.optimize import curve_fit

# generate data
rng = np.random.default_rng(23457834572346)
x = np.linspace(-1, 1, 30)
noise = 0.05 * rng.standard_normal(size=x.shape)
y = -2*x**2 + 1 + noise

# over-parameterized fit
def my_quad(x, a, b, y0):
    return a*(1-x**2/(2*b**2))+ y0

bounds=([0, 0, -np.inf], [np.inf, np.inf, np.inf])
popt, pcov = curve_fit(my_quad, x, y, bounds=bounds) 
err = np.sqrt(np.diag(pcov))  # array([3028947.74320428,  544624.83253159, 3028947.74412785])
y_ = my_quad(x, *popt)
r2 = np.corrcoef(y, y_)[0, 1]  # 0.9968876754155439

# remove any one parameter
def my_quad(x, a, b):
    return a*(1-x**2/(2*b**2))

bounds=([0, 0], [np.inf, np.inf])
popt, pcov = curve_fit(my_quad, x, y, bounds=bounds) 
err = np.sqrt(np.diag(pcov))  # array([0.01460553, 0.00260903])
y_ = my_quad(x, *popt)
r2 = np.corrcoef(y, y_)[0, 1]  # 0.9968876754155439

# plot results
plt.plot(x, y, '.')
plt.plot(x, y_, '-')
plt.plot(x, y_2, '--')