pythonpython-3.xcurve-fittingscipy-optimizephase

Scipy Optimize CurveFit calculates wrong values


I am interesting in knowing the phase shift between two sine-waves type. For that I am trying to fit each wave with scipy.cuve_fit. I have been following this post. However I obtain negative amplitudes and the phase shift looks like forwarded pi radians sometimes.

The code that I am using is that one below:

def fit_sin_LD(t_LD, y_LD):

'''Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc"'''

ff = np.fft.fftfreq(len(t_LD), (t_LD[1]-t_LD[0]))   # assume uniform spacing
Fyy = abs(np.fft.fft(y_LD))
guess_freq = abs(ff[np.argmax(Fyy[1:])+1])   # excluding the zero frequency "peak", which is related to offset
guess_amp = np.std(y_LD) * 2.**0.5
guess_offset = np.mean(y_LD)
guess = np.array([guess_amp, 2.*np.pi*guess_freq, 0., guess_offset])

def sinfunc_LD(t_LD, A, w, p, c):
    return A * np.sin(w*t_LD + p) + c
#boundary=([0,-np.inf,-np.pi, 1.5],[0.8, +np.inf, np.pi, 2.5])
popt, pcov = scipy.optimize.curve_fit(sinfunc_LD, t_LD, y_LD, p0=guess, maxfev=3000) # with maxfev= number I can increase the number of iterations
A, w, p, c = popt
f          = w/(2.*np.pi)
fitfunc_LD = lambda t_LD: A*np.sin(w*t_LD + p) + c
fitted_LD  = fitfunc_LD(t_LD)
dic_LD = {"amp_LD": A, "omega_LD": w, "phase_LD": p, "offset_LD": c, "freq_LD": f, "period_LD": 1./f, "fitfunc_LD": fitted_LD, "maxcov_LD": np.max(pcov), "rawres_LD": (guess, popt, pcov)}
return dic_LD


def fit_sin_APD(t_APD, y_APD):

''' Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc" '''

ff = np.fft.fftfreq(len(t_APD), (t_APD[1]-t_APD[0]))   # assume uniform spacing
Fyy = abs(np.fft.fft(y_APD))
guess_freq = abs(ff[np.argmax(Fyy[1:])+1])   # excluding the zero frequency "peak", which is related to offset
guess_amp = np.std(y_APD) * 2.**0.5
guess_offset = np.mean(y_APD)
guess = np.array([guess_amp, 2.*np.pi*guess_freq, 0., guess_offset])

def sinfunc_APD(t_APD, A, w, p, c):
    return A * np.sin(w*t_APD + p) + c
#boundary=([0,0,-np.pi, 0.0],[np.inf, np.inf, np.pi, 0.7])
popt, pcov  = scipy.optimize.curve_fit(sinfunc_APD, t_APD, y_APD, p0=guess, maxfev=5000) # with maxfev= number I can increase the number of iterations
A, w, p, c  = popt
f           = w/(2.*np.pi)
fitfunc_APD = lambda t_APD: A*np.sin(w*t_APD + p) + c
fitted_APD  = fitfunc_APD(t_APD)
dic_APD     = {"amp_APD": A, "omega_APD": w, "phase_APD": p, "offset_APD": c, "freq_APD": f, "period_APD": 1./f, "fitfunc_APD": fitted_APD, "maxcov_APD": np.max(pcov), "rawres_APD": (guess, popt, pcov)}       
return dic_APD

I dont understand why curve_fit is returning a negative amplitude (that in terms of physics has not sense). I have tried as well setting boundary conditions as **kwargs* with:

bounds=([0.0, -np.inf,-np.pi, 0.0],[+np.inf, +np.inf,-np.pi, +np.inf])

but it yields a more weird result.

I added an image showing this difference:

enter image description here

Does anyone how to overcome this issue with phases and amplitudes?

Thanks in advance


Solution

  • There are a few issues here that I do not understand:

    1. There is no need to define the fit function inside the "fit function"
    2. There is no need to define it twice if the only difference is the naming of the dictionary. (While I do not understand why this has to be named differently in the first place)
    3. One could directly fit the frequency instead of omega
    4. When pre-calculating the fitted values, directly use the given fitfunction

    Overall I don't see why the second fit should fail and using some generic data here, it doesn't. Considering the fact that in physics an amplitude can be complex I don't have a problem with a negative results. Nevertheless, I understand the point in the OP. Surely, a fit algorithm does not know about physics and, mathematically, there is no problem with the amplitude being negative. This just gives an additional phase shift of pi. Hence, one can easily force positive amplitudes when taking care of the required phase shift. I introduced this here as possible keyword argument. Moreover I reduced this to one fit function with possible "renaming" of the output dictionary keys as keyword argument.

    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.optimize import curve_fit
    
    
    def sinfunc( t, A, f, p, c ):
        return A * np.sin( 2.0 * np.pi * f * t + p) + c
    
    def fit_sin(t_APD, y_APD, addName="", posamp=False):
    
        ''' Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc" '''
    
        ff = np.fft.fftfreq( len( t_APD ), t_APD[1] - t_APD[0] ) # assume uniform spacing
        Fyy = abs( np.fft.fft( y_APD ) )
        guess_freq = abs( ff[np.argmax( Fyy[1:] ) + 1] )   # excluding the zero frequency "peak", which is related to offset
        guess_amp = np.std( y_APD ) * 2.**0.5
        guess_offset = np.mean( y_APD )
        guess = np.array( [ guess_amp, guess_freq, 0., guess_offset ] )
    
        popt, pcov  = curve_fit(sinfunc, t_APD, y_APD, p0=guess, maxfev=500) # with maxfev= number I can increase the number of iterations
        if popt[0] < 0 and posamp:
            popt[0] = -popt[0]
            popt[2] += np.pi 
            popt[2] = popt[2] % ( 2 * np.pi )
        A, f, p, c  = popt 
        fitted_APD  = sinfunc( t_APD, *popt )
        dic_APD     = {
                "amp{}".format(addName): A, 
                "omega{}".format(addName): 2.0 * np.pi * f, 
                "phase{}".format(addName): p, 
                "offset{}".format(addName): c, 
                "freq{}".format(addName): f, 
                "period{}".format(addName): 1.0 / f, 
                "fitfunc{}".format(addName): fitted_APD, 
                "maxcov{}".format(addName): np.max( pcov ), 
                "rawres{}".format(addName): ( guess, popt, pcov ) }
        return dic_APD
    
    tl = np.linspace(0,1e-6, 150 )
    sl1 = np.fromiter( (sinfunc(t, .18, 4998735, 3.6, 2.0 ) + .01 *( 1 - 2 * np.random.random() ) for t in tl ), np.float )
    sl2 = np.fromiter( (sinfunc(t, .06, 4998735, 2.1, 0.4 ) + .01 *( 1 - 2 * np.random.random() ) for t in tl ), np.float )
    
    ld = fit_sin(tl, sl1, addName="_ld" )
    print ld["amp_ld"]
    ld = fit_sin(tl, sl1, addName="_ld", posamp=True )
    print ld["amp_ld"]
    apd = fit_sin(tl, sl2 )
    
    fig = plt.figure("1")
    ax = fig.add_subplot( 1, 1, 1 )
    
    ax.plot( tl, sl1, color="r" )
    ax.plot( tl, ld["fitfunc_ld"], color="k", ls="--" )
    ax.plot( tl, sl2, color="#50FF80" )
    ax.plot( tl, apd["fitfunc"], color="k", ls="--" )
    
    ax.grid()
    plt.show()
    

    This gives me:

    -0.180108427200549
    0.180108427200549
    

    i.e. in the first try, despite the good guess for the amplitude, it turns out negative. This is probably due to the large phase. As that guess is zero, it is easier for the algorithm to switch sign of the amplitude first and then adjusting the phase. As mentioned above, this is corrected easily and does not even require error propagation.

    my fits