I am writing mass spectrometry data reduction software and am trying to fit a set of raw data y, t
, where y
is the intensity of a given gas species in the mass spectrometer and t
is the timestamp associated with that intensity measurement, to a double natural log function.
I am using the following functions:
def double_ln_func(t, a, b, p, c, q, d):
return a + b * np.log(p * t) + c * np.log(q * t)
def double_ln_fit(t, y):
popt, pcov = curve_fit(double_ln_func, t, y)
fitted_t = np.linspace(min(t), max(t), 100)
fitted_y = double_ln_func(fitted_t, *popt)
return fitted_t, fitted_y, popt
Typically, one of the natural log terms is positive to represent ingrowth from memory and the other is negative to represent consumption from ionization.
This generally returns reasonable results:
Data for the example above:
He 3798 Q1520_ 2024-06-25T09:49:42
time_sec 2.22 3.25 4.24 40.27
2.691 1.918947E-9 1.353086E-9 1.083636E-9 2.424798E-11
8.393 1.924420E-9 1.352441E-9 1.081825E-9 2.451031E-11
13.995 1.924468E-9 1.350765E-9 1.082597E-9 2.493171E-11
19.595 1.925282E-9 1.349962E-9 1.081301E-9 2.494261E-11
25.195 1.927887E-9 1.349370E-9 1.080510E-9 2.491214E-11
30.895 1.930570E-9 1.348243E-9 1.079783E-9 2.517592E-11
36.495 1.932480E-9 1.347466E-9 1.079611E-9 2.510602E-11
42.095 1.931847E-9 1.348131E-9 1.079237E-9 2.513002E-11
47.795 1.934760E-9 1.346782E-9 1.078437E-9 2.512803E-11
53.395 1.929853E-9 1.345735E-9 1.078367E-9 2.498039E-11
58.996 1.934834E-9 1.345541E-9 1.077280E-9 2.520110E-11
64.596 1.932654E-9 1.344552E-9 1.077504E-9 2.550219E-11
however, in other cases, the fit lines become jagged and non-parametric, such as in the 4 and 40 amu examples below:
Data for the above example:
He 3800 CB1 2024-06-25T10:19:42
time_sec 2.22 3.25 4.24 40.27
1.616 1.890576E-9 1.359579E-9 7.204719E-13 1.014118E-11
7.218 1.894592E-9 1.357168E-9 7.196407E-13 1.130713E-11
12.819 1.894756E-9 1.355699E-9 7.183293E-13 1.139818E-11
18.519 1.895724E-9 1.354677E-9 7.262793E-13 1.098621E-11
24.119 1.899128E-9 1.354235E-9 7.101848E-13 1.077834E-11
29.729 1.898467E-9 1.353402E-9 7.125748E-13 1.187576E-11
35.430 1.899100E-9 1.351661E-9 7.355859E-13 1.104034E-11
41.032 1.899836E-9 1.352259E-9 7.077889E-13 1.089571E-11
46.633 1.902685E-9 1.351637E-9 7.197934E-13 1.141361E-11
52.235 1.902119E-9 1.351389E-9 7.155005E-13 1.151000E-11
57.835 1.902982E-9 1.350957E-9 7.118936E-13 1.155836E-11
63.535 1.907762E-9 1.351266E-9 7.195685E-13 1.208988E-11
Although the results above contain a high degree of scatter, and these types of "solutions" often appear for high-scatter measurements, the function is capable of fitting many high-scatter cases without resorting to this type of "solution".
On occasion, I see a warning in the terminal:
OptimizeWarning: Covariance of the parameters could not be estimated
but not often enough to explain every instance of the 'crazy' fits.
For some reason, 4 amu data is most sensitive and most prone to being mis-fit than any other data set.
How is it possible that curve_fit
is returning parameters that plot like this, and how can I force it to stick to 'real' solutions? Or perhaps there's a better function than curve_fit in this instance?
Bah, of course I figured it out right after I wrote this post.
It was a scaling issue, due to the raw intensity numbers being so low.
This produces perfectly reasonable fits for most data:
def double_ln_func(t, a, b, p, c, q):
return a + b * np.log(p * t) - c * np.log(q * t)
def double_ln_fit(t, y):
y = y * 1e10
popt, pcov = curve_fit(double_ln_func, t, y)#, p0=initial_guess)
fitted_t = np.linspace(min(t), max(t), 100)
fitted_y = double_ln_func(fitted_t, *popt) / 1e10
return fitted_t, fitted_y, popt
It refuses to fit some measurements entirely, but that's a topic for a different post.