python numpy scipy-optimize lmfit masked-array

How to implement masked array fitting using lmfit (prior using sigma option of curve_fit)

I am trying to implement masked-array fitting using lmfit. Prior I was using curve_fit using the following code snippet. I have omitted the def spec due to its line length. Essentially def spec is function that performs 50 interpolations of templates.

Both the observation data and templates have the file structure x=wavelength, y=flux from x=500 to x=550nm.

The observation data is a mixtures of two spectra with each spectrum have different parameters.

The file lambda_input is a list of wavelengths (example: list of wavelengths[1]) to be used in the fitting process. The fit is only to be carried out in these wavelength ranges (+/- line-widths (lw))

Using sigma=weight works but now I need to use lmfit I am struggling to find a method that works.

How can I mask both the observation and templates wavelengths to input_lambda in order to use lmfit?

Thank you

import pandas as pd
import numpy as np
from scipy import interpolate
from scipy.optimize import curve_fit
import lmfit

i_run = pd.read_csv("run.txt",delimiter='\t')
i_run.columns = ["wave","flux"]
run_w = np.array(i_run['wave'])
run_f = np.array(i_run['flux']) # observation data

lambda_input = pd.read_excel("../lambda/weights.xlsx") #fitting wavelengths

v1 = 5 # conversion factors
v2 = 6
lw = 0.01 # linewidths (+/- fitting wavelength)

weight = np.zeros(len(run_w)) 
for w in range (0,187):
    n = lambda_input.iat[w,1]
    weight += np.select([np.logical_and(run_w >= ((n *(1 + (v1)))-lw),run_w <= ((n *(1 + (v1)))+lw)),run_w>0],[1,0])
for w in range (0,187):
    n = lambda_input.iat[w,1]
    weight += np.select([np.logical_and(run_w >= ((n *(1 + (v2)))-lw),run_w <= ((n *(1 + (v2)))+lw)),run_w>0],[1,0])
weight = np.select([weight>0,weight==0],[1,1e6])

p1 = 750
p2 = 800
p3 = 1.0
p4 = 20.0
p5 = 30.0
p6 = 0.5
p7 = 0.5
p8 = 100
p9 = -100
p10 = 4.0
p11 = 4.0
p12 = 3.0
p13 = 3.0

def spec(run_w,t1,t2,r,v1,v2,m1,m2,r1,r2,l1,l2,mi1,mi2):

    return spectrum

popt, pcov = curve_fit(spec, xdata=run_w,sigma=weight,ydata=run_f, p0=[p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13])

model = lmfit.Model(spec) #fitting: lmfit
p = model.make_params(t1=p1,t2=p2,r=p3,v1=p4,v2=p5,m1=p6,m2=p7,r1=p8,r2=p9,l1=p10,l2=p11,mi1=p12,mi2=p13)
fit = model.fit(data=run_f, params=p, run_w=run_w, method='nelder', nan_policy='omit')
lmfit.report_fit(fit)
fit.plot()


  [1]: https://i.sstatic.net/9wAb2.png

Solution

With lmfit.Model, weighting the elements of the data array is done with the weights to the lmfit.Model.fit() method, to modify the quantity to be minimized in the least-squares sense from data-model to weights*(data-model). See https://lmfit.github.io/lmfit-py/model.html#lmfit.model.ModelResult.fit

Ironically, you have a weight array, but use this as sigma in curve_fit. To use with lmfit.Model.fit(), you would want to use 1.0/weight, or modify your weight` to something like:

weight = np.select([weight>0,weight==0],[1,1.e6])

and then use with

fit = model.fit(data=run_f, params=p, run_w=run_w, 
                weights=weight, method='nelder', nan_policy='omit')
lmfit.report_fit(fit)

I would make a couple more suggestions:

don't use method='nelder' as the fitting method unless you know you need that (and if, then ask yourself why you didn't use that with curve_fit). The default fitting method is highly recommended, especially to start.
If your data has NaNs or Infs, it is way better to remove them before doing the fit than relying on nan_policy. If your model generates NaNs or Infs, nan_policy won't help at all - those would have to be checked for and resolved.
a really key feature of lmfit.Model is that parameters are named so as to be meaningful to the person running and interpreting the results. Having 13 parameters with names like t1, t2, v1, v2, m1, m2`, etc is going to be hard to interpret in the future. Names like "time1", "theta1", "velo1", "mass1", "mean1", etc will be much clearer.