pythoncurve-fittinglmfit

Omit values when fitting (lmfit nan_policy)


Say I have a signal with an underlaying gauss and some noise. Fitting is of course no problem:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import lmfit
from lmfit.models import *

x = np.arange(100)
gauss = np.array(norm.pdf(x, 50, 10))
noise = np.random.normal(0, 0.001, 100)
signal = gauss + noise
plt.plot(x, signal)

model = GaussianModel()

model.set_param_hint('amplitude', value = 0.5, min = 0.1, max = 10)
model.set_param_hint('center', value = 50, min = 20, max = 80)
model.set_param_hint('sigma', value = 10, min = 1, max = 50)
params = model.make_params()

fit = model.fit(signal, params, x = x)

plt.plot(x, signal)
plt.plot(x, fit.best_fit)

enter image description here

Now say for some arbitrary reason my signal is cutoff. Some part of my gaussian shaped signal is missing (some saturation effect). Fitting with the same procedure now leads to a wrong fit. It makes sense since the saturation values are taken into consideration.

signal_cutoff = [value if value < 0.025 else 0.025 for value in signal]

fit2 = model.fit(signal_cutoff, params, x = x)
plt.plot(x, signal_cutoff)
plt.plot(x, fit2.best_fit)
plt.ylim(0, 0.04)

enter image description here

My idea how to avoid this was to replace the saturation values with None and then perform the fit with the nan_policy set to propagate:

signal_cutoff_omitted = [value if value < 0.025 else None for value in signal_cutoff]
fit3 = model.fit(signal_cutoff_omitted, params, x = x, nan_policy = 'propagate')
plt.plot(x, signal_cutoff_omitted)
plt.plot(x, fit3.best_fit)
plt.ylim(0, 0.04)

enter image description here

As one can see the fit is just the initial values which is also shown in the fit report. Not sure what I am doing wrong or if this is even the right approach. Any ideas how to achieve what I am trying to do in an elegant way?


Solution

  • Remove the points from the data to fit. Perhaps something like:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.stats import norm
    import lmfit
    from lmfit.models import GaussianModel
    
    x = np.arange(100)
    gauss = np.array(norm.pdf(x, 50, 10))
    noise = np.random.normal(0, 0.001, 100)
    signal = gauss + noise
    
    # get indices for data points that are not saturated
    not_saturated = np.where(signal < 0.025)[0]
    
    x_good = x[not_saturated]
    s_good = signal[not_saturated]
    
    
    plt.plot(x, signal, '#AABBCC', marker='+', linewidth=0, label='all data')
    plt.plot(x_good, s_good, '#880000', marker='o', linewidth=0, label='good data')
    
    model = GaussianModel()
    params = model.make_params(amplitude=0.5, center=48, sigma=15)
    
    result = model.fit(s_good, params, x=x_good)
    print(result.fit_report())
    plt.plot(x_good, result.best_fit, 'k', label='fit')
    
    # predict for saturated data
    predict = result.eval(result.params, x=x)
    plt.plot(x, predict, 'g', label='predict for saturated x')
    
    plt.legend()
    plt.show()
    

    which will print a report of:

    [[Model]]
        Model(gaussian)
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 26
        # data points      = 81
        # variables        = 3
        chi-square         = 7.9661e-05
        reduced chi-square = 1.0213e-06
        Akaike info crit   = -1114.40646
        Bayesian info crit = -1107.22312
    [[Variables]]
        amplitude:  0.98074785 +/- 0.01877254 (1.91%) (init = 0.5)
        center:     49.9665431 +/- 0.11026782 (0.22%) (init = 48)
        sigma:      9.95029883 +/- 0.16996484 (1.71%) (init = 15)
        fwhm:       23.4311627 +/- 0.40023660 (1.71%) == '2.3548200*sigma'
        height:     0.03932161 +/- 0.00129373 (3.29%) == '0.3989423*amplitude/max(1e-15, sigma)'
    [[Correlations]] (unreported correlations are < 0.100)
        C(amplitude, sigma) = -0.649
    

    and generate a plot like this:

    enter image description here