pythonscipydata-analysiscurve-fitting

good r2 score but huge parameter uncertainty


I'm using a quadratic function to fit my data. I have good R2 score but huge uncertainty in my fitting parameters

Here is the graph and the results:

Graph of fit
R2 score: 0.9698143924536671
uncertainty in a, b, and y0: 116.93787913, 10647.11867972, 116.93787935

How should I intepret this result?

Here is how I defined the quadratic function:

def my_quad(x, a, b, y0):
    return a*(1-x**2/(2*b**2))+ y0

Here's how I calculated the uncertainty for the parameters and R2 score:

popt, pcov = curve_fit(my_quad, x_data,y_data, bounds=([0, 0, -np.inf], [np.inf, np.inf, np.inf]))
a, b, y= popt
err = np.sqrt(np.diag(pcov))
y_pred = my_quad(x_data, *popt)
r2 = r2_score(y_data, y_pred))

Solution

  • Your model is over-parametrized. You can tell when you expand the polynomial:

    a * (1 - x**2     / (2*b**2)) + y0 -> 
         a - x**2 * a / (2*b**2)  + y0 ->
      y0+a - x**2 * a / (2*b**2)
    

    The are only two independent parameters, y0 + a and a / (2*b**2).

    You will be able to fit just as well with any two of your original parameters, and then the uncertainty will be reduced significantly.

    For example:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import stats
    from scipy.optimize import curve_fit
    
    # generate data
    rng = np.random.default_rng(23457834572346)
    x = np.linspace(-1, 1, 30)
    noise = 0.05 * rng.standard_normal(size=x.shape)
    y = -2*x**2 + 1 + noise
    
    # over-parameterized fit
    def my_quad(x, a, b, y0):
        return a*(1-x**2/(2*b**2))+ y0
    
    bounds=([0, 0, -np.inf], [np.inf, np.inf, np.inf])
    popt, pcov = curve_fit(my_quad, x, y, bounds=bounds) 
    err = np.sqrt(np.diag(pcov))  # array([3028947.74320428,  544624.83253159, 3028947.74412785])
    y_ = my_quad(x, *popt)
    r2 = np.corrcoef(y, y_)[0, 1]  # 0.9968876754155439
    
    # remove any one parameter
    def my_quad(x, a, b):
        return a*(1-x**2/(2*b**2))
    
    bounds=([0, 0], [np.inf, np.inf])
    popt, pcov = curve_fit(my_quad, x, y, bounds=bounds) 
    err = np.sqrt(np.diag(pcov))  # array([0.01460553, 0.00260903])
    y_ = my_quad(x, *popt)
    r2 = np.corrcoef(y, y_)[0, 1]  # 0.9968876754155439
    
    # plot results
    plt.plot(x, y, '.')
    plt.plot(x, y_, '-')
    plt.plot(x, y_2, '--')
    

    enter image description here