python scipy nonlinear-optimization scipy-optimize-minimize

How do I interpret scipy.optimize.minimize convergence metrics? Specifically, nit, nfev, njev in the output

I am solving an constrained, nonlinear optimization problem with scipy.optimize.minimize using the SLSQP solver. The optimization converges properly, but I want to have a better understanding of how to interpret the OptimizeResult output, specifically nit, nfev, and njev. Scipy documentation defines them as the following:

nit: Number of iterations performed by the optimizer.
nfev, njev: Number of evaluations of the objective functions and of its Jacobian

And here is an example output from the optimizer which I would like to understand:

----------------------------------------
 message: Optimization terminated successfully
 success: True
  status: 0
     fun: -0.2498255944127068
       x: [ 6.087e-02  7.000e-02  7.000e-02  7.000e-02  7.000e-02
            7.000e-02  7.000e-02  7.000e-02]
     nit: 6
     jac: [-4.197e-02 -8.534e-02 -2.353e-02 -1.421e-02 -8.549e-02
           -5.721e-02 -1.725e-02 -4.846e-03]
    nfev: 54
    njev: 6

My questions are:

Why are there far more function evaluations than Jacobian evaluations? From my understanding, there should be a Jacobian evaluation with each update step.
Do these convergence metrics give any indication of how stable this solution is, or how quickly it converged?
In general, what can I learn about my optimization problem from these metrics alone, and what else may I have to dive deeper into in order to understand the objective landscape?

Solution

Why are there far more function evaluations than Jacobian evaluations? From my understanding, there should be a Jacobian evaluation with each update step.

A major reason why this might happen is numerical differentiation. If a method requires a Jacobian, and it is not provided, then it will calculate it by taking a numerical derivative. SLSQP is one of the methods that uses Jacobians.

Suppose that you have a function with two dimensions, x and y, and that h is a small constant. The jacobian of your function is calculated by evaluating your function at f(x, y), at f(x + h, y) and f(x, y + h). These are included in the total number of function evaluations reported by nfev. In this example, this would add 1 Jacobian evaluation and 3 function evaluations every time the Jacobian is evaluated.

Another reason is that SLSQP doesn't only use one function evaluation per Jacobian, even if you provide a Jacobian.

Here's a little test program I wrote, which solves the rosenbrock test function, and prints what calls happen.

from scipy.optimize import minimize, rosen, rosen_der

x0 = [0, 0, 0]
def my_fun(x):
    print("fun", x)
    return rosen(x)

def my_jac(x):
    print("jac", x)
    return rosen_der(x)

res = minimize(my_fun, x0, jac=my_jac, method='SLSQP')
print(res)

Output:

fun [0. 0. 0.]
jac [0. 0. 0.]
fun [2. 2. 0.]
fun [0.2 0.2 0. ]
fun [0.02857143 0.02857143 0.        ]
jac [0.02857143 0.02857143 0.        ]
fun [4.62639615 0.23806354 0.12189813]
fun [0.4883539  0.04952064 0.01218981]
fun [0.08780646 0.03127037 0.00157045]
jac [0.08780646 0.03127037 0.00157045]
fun [45.78724867  2.11162674 -5.08696515]
fun [ 4.65775068  0.23930601 -0.50728311]
fun [ 0.54480088  0.05207394 -0.04931491]
fun [ 0.1335059   0.03335073 -0.00351809]
jac [ 0.1335059   0.03335073 -0.00351809]
fun [115.19049536   5.99562843   1.02242671]
fun [11.63920485  0.6295785   0.09907639]
fun [1.2840758  0.09297351 0.00674136]
fun [ 0.24856289  0.03931301 -0.00249214]
jac [ 0.24856289  0.03931301 -0.00249214]
fun [ 0.27545629  0.05670601 -0.01992538]
jac [ 0.27545629  0.05670601 -0.01992538]
fun [ 0.28277162  0.07403693 -0.01267809]
jac [ 0.28277162  0.07403693 -0.01267809]
fun [0.47710742 0.19027162 0.02664537]
jac [0.47710742 0.19027162 0.02664537]
fun [0.49440281 0.24085986 0.05179501]
jac [0.49440281 0.24085986 0.05179501]
fun [0.67905328 0.42674455 0.12341217]
fun [0.57628947 0.32329386 0.08355496]
jac [0.57628947 0.32329386 0.08355496]
fun [0.69247876 0.45346755 0.17285386]
jac [0.69247876 0.45346755 0.17285386]
fun [0.69010062 0.47653568 0.22499494]
jac [0.69010062 0.47653568 0.22499494]
fun [0.75933513 0.57446721 0.31655777]
jac [0.75933513 0.57446721 0.31655777]
fun [0.85114905 0.7019671  0.46125572]
fun [0.80422492 0.63680463 0.38730369]
jac [0.80422492 0.63680463 0.38730369]
fun [0.8339492  0.68661815 0.46324688]
jac [0.8339492  0.68661815 0.46324688]
fun [0.88524572 0.78398497 0.59548244]
jac [0.88524572 0.78398497 0.59548244]
fun [0.89346106 0.80657631 0.6553568 ]
jac [0.89346106 0.80657631 0.6553568 ]
fun [0.94772781 0.8892977  0.78436098]
jac [0.94772781 0.8892977  0.78436098]
fun [0.94255758 0.88753884 0.7848004 ]
jac [0.94255758 0.88753884 0.7848004 ]
fun [0.96068296 0.92412895 0.84894826]
jac [0.96068296 0.92412895 0.84894826]
fun [0.98238831 0.96481404 0.92927089]
jac [0.98238831 0.96481404 0.92927089]
fun [0.99221044 0.98410508 0.96738135]
jac [0.99221044 0.98410508 0.96738135]
fun [0.99683275 0.99381796 0.98847377]
jac [0.99683275 0.99381796 0.98847377]
fun [1.00297977 1.00587295 1.01117023]
fun [1.00018303 1.00038824 1.00084392]
jac [1.00018303 1.00038824 1.00084392]
fun [1.00004687 1.00009601 1.0002024 ]
 message: Optimization terminated successfully
 success: True
  status: 0
     fun: 2.2666304652949757e-08
       x: [ 1.000e+00  1.000e+00  1.000e+00]
     nit: 24
     jac: [-8.491e-03 -2.172e-02  1.346e-02]
    nfev: 38
    njev: 24

In this output, sometimes it runs 4 function calls before calling the Jacobian. It appears to be using those function calls to do a line search.

Do these convergence metrics give any indication of how stable this solution is

No. For example, the function lambda x: (10 - x[0] - x[1]) ** 2 doesn't have a unique solution: any combination of values that adds to 10 will get the minimum value. As a result, if you put in a different x0, you'll get a different answer. However, the output of minimize() does not tell you this.

or how quickly it converged?

Yes. Assuming you're working on a problem where the minimizer spends most of its time calling your objective function, which is very common, then a 2x reduction in objective function calls corresponds pretty directly to a 2x reduction in how long it takes the problem to be solved.

In general, what can I learn about my optimization problem from these metrics alone, and what else may I have to dive deeper into in order to understand the objective landscape?

What you can learn is how hard the optimizer has to work to solve the problem. Sometimes, you have multiple equivalent ways that you can parameterize a problem, and some of those are easier to solve than others. I give a worked example of that here.

The second question is not specific enough to answer. There are a lot of things that it doesn't tell you. For example, is this a problem which has multiple local minima, where a local optimizer might get stuck? These statistics can't tell you.