python scipy least-squares levenberg-marquardt

How to calculate "relative error in the sum of squares" and "relative error in the approximate solution" from least squares method?

I have implemented a 3D gaussian fit using scipy.optimize.leastsq and now I would like to tweak the arguments ftol and xtol to optimize the performances. However, I don't understand the "units" of these two parameters in order to make a proper choice. Is it possible to calculate these two parameters from the results? That would give me an understanding of how to choose them. My data is numpy arrays of np.uint8. I tried to read the FORTRAN source code of MINIPACK but my FORTRAN knowledge is zero. I also read checked the Levenberg-Marquardt algorithm, but I could not really get a number that was below the ftol for example.

Here is a minimal example of what I do:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq

class gaussian_model:
    def __init__(self):
        self.prev_iter_model = None
        self.f_vals = []

    def gaussian_1D(self, coeffs, xx):
        A, sigma, mu = coeffs
        # Center rotation around peak center
        x0 = xx - mu
        model = A*np.exp(-(x0**2)/(2*(sigma**2)))
        return model

    def residuals(self, coeffs, I_obs, xx, model_func):
        model = model_func(coeffs, xx)
        residuals = I_obs - model
        if self.prev_iter_model is not None:
            self.f = np.sum(((model-self.prev_iter_model)/model)**2)
            self.f_vals.append(self.f)
        self.prev_iter_model = model
        return residuals


# x data
x_start = 1
x_stop = 10
num = 100
xx, dx = np.linspace(x_start, x_stop, num, retstep=True)

# Simulated data with some noise
A, s_x, mu = 10, 0.5, 3
coeffs = [A, s_x, mu]
model = gaussian_model()
yy = model.gaussian_1D(coeffs, xx)
noise_ampl = 0.5
noise = np.random.normal(0, noise_ampl, size=num)
yy += noise

# LM Least squares
initial_guess = [1, 1, 1]
pred_coeffs, cov_x, info, mesg, ier = leastsq(model.residuals, initial_guess,
                                              args=(yy, xx, model.gaussian_1D),
                                              ftol=1E-6, full_output=True)
yy_fit = model.gaussian_1D(pred_coeffs, xx)

rel_SSD = np.sum(((yy-yy_fit)/yy)**2)
RMS_SSD = np.sqrt(rel_SSD/num)

print(RMS_SSD)
print(model.f)
print(model.f_vals)

fig, ax = plt.subplots(1,2)

# Plot results
ax[0].scatter(xx, yy)
ax[0].plot(xx, yy_fit, c='r')

ax[1].scatter(range(len(model.f_vals)), model.f_vals, c='r')

# ax[1].set_ylim(0, 1E-6)

plt.show()

rel_SSD is around 1 and definitely not something below ftol = 1E-6.

EDIT: Based on @user12750353 answer below I updated my minimal example to try to recreate how lmdif determines termination with ftol. The problem is that my f_vals are too small, so they are not the right values. The reason I would like to recreate this is that I would like to see what kind of numbers I am getting on my main code to decide on a ftol that would terminate the fitting process earlier.

Solution

Since you are giving a function without the gradient, the method called is lmdif. Instead of gradients it will use forward difference gradient estimate, f(x + delta) - f(x) ~ delta * df(x)/dx (I will write as if the parameter).

There you find the following description

c       ftol is a nonnegative input variable. termination
c         occurs when both the actual and predicted relative
c         reductions in the sum of squares are at most ftol.
c         therefore, ftol measures the relative error desired
c         in the sum of squares.
c
c       xtol is a nonnegative input variable. termination
c         occurs when the relative error between two consecutive
c         iterates is at most xtol. therefore, xtol measures the
c         relative error desired in the approximate solution.

Looking in the code the actual reduction acred = 1 - (fnorm1/fnorm)**2 is what you calculated for rel_SSD, but between the two last iterations, not between the fitted function and the target points.

Example

The problem here is that we need to discover what are the values assumed by the internal variables. An attempt to do so is to save the coefficients and the residual norm every time the function is called as follows.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq

class gaussian_model:
    def __init__(self):
        self.prev_iter_model = None
        self.fnorm = []
        self.x = []

    def gaussian_1D(self, coeffs, xx):
        A, sigma, mu = coeffs
        # Center rotation around peak center
        x0 = xx - mu
        model = A*np.exp(-(x0**2)/(2*(sigma**2)))
        grad = np.array([
            model / A,
            model * x0**2 / (sigma**3),
            model * 2 * x0 / (2*(sigma**2))
        ]).transpose();
        
        return model, grad

    def residuals(self, coeffs, I_obs, xx, model_func):
        model, grad = model_func(coeffs, xx)
        residuals = I_obs - model
        self.x.append(np.copy(coeffs));
        self.fnorm.append(np.sqrt(np.sum(residuals**2)))
        return residuals
    
    def grad(self, coeffs, I_obs, xx, model_func):
        model, grad = model_func(coeffs, xx)
        residuals = I_obs - model
        return -grad
    
    def plot_progress(self):
        x = np.array(self.x)
        dx = np.sqrt(np.sum(np.diff(x, axis=0)**2, axis=1))
        plt.plot(dx / np.sqrt(np.sum(x[1:, :]**2, axis=1)))
        fnorm = np.array(self.fnorm)
        plt.plot(1 - (fnorm[1:]/fnorm[:-1])**2)
        plt.legend(['$||\Delta f||$', '$||\Delta x||$'], loc='upper left');
        
# x data
x_start = 1
x_stop = 10
num = 100
xx, dx = np.linspace(x_start, x_stop, num, retstep=True)

# Simulated data with some noise
A, s_x, mu = 10, 0.5, 3
coeffs = [A, s_x, mu]
model = gaussian_model()
yy, _ = model.gaussian_1D(coeffs, xx)
noise_ampl = 0.5
noise = np.random.normal(0, noise_ampl, size=num)
yy += noise

Then we can see the relative variation of $x$ and $f$

initial_guess = [1, 1, 1]
pred_coeffs, cov_x, info, mesg, ier = leastsq(model.residuals, initial_guess,
                                              args=(yy, xx, model.gaussian_1D),
                                              xtol=1e-6,
                                              ftol=1e-6, full_output=True)

plt.figure(figsize=(14, 6))
plt.subplot(121)
model.plot_progress()
plt.yscale('log')
plt.grid()
plt.subplot(122)

yy_fit,_ = model.gaussian_1D(pred_coeffs, xx)
# Plot results
plt.scatter(xx, yy)
plt.plot(xx, yy_fit, c='r')
plt.show()

The problem with this is that the function is evaluated both to compute f and to compute the gradient of f. To produce a cleaner plot what can be done is to implement pass Dfun so that it evaluate func only once per iteration.

# x data
x_start = 1
x_stop = 10
num = 100
xx, dx = np.linspace(x_start, x_stop, num, retstep=True)

# Simulated data with some noise
A, s_x, mu = 10, 0.5, 3
coeffs = [A, s_x, mu]
model = gaussian_model()
yy, _ = model.gaussian_1D(coeffs, xx)
noise_ampl = 0.5
noise = np.random.normal(0, noise_ampl, size=num)
yy += noise

# LM Least squares
initial_guess = [1, 1, 1]
pred_coeffs, cov_x, info, mesg, ier = leastsq(model.residuals, initial_guess,
                                              args=(yy, xx, model.gaussian_1D),
                                              Dfun=model.grad,
                                              xtol=1e-6,
                                              ftol=1e-6, full_output=True)

plt.figure(figsize=(14, 6))
plt.subplot(121)
model.plot_progress()
plt.yscale('log')
plt.grid()
plt.subplot(122)

yy_fit,_ = model.gaussian_1D(pred_coeffs, xx)
# Plot results
plt.scatter(xx, yy)
plt.plot(xx, yy_fit, c='r')
plt.show()

Well, the value I am obtaining for xtol is not exactly what is in the lmdif implementation.