I want to find the local minimum for a function that depends on 2 variables. For that my plan was to use the scipy.optimize.minimize function with the "newton-cg" method because I can calculate the jacobian and hessian analytically.
However, when my starting guesses are on a local maximum, the function terminates successfully in the first iteration step on top of the local maximum, even though the hessian is negative. I've written a short test code where I was capable of reproducing the issue:
import numpy as np
import scipy.optimize as o
def get_f_df(var):
x = var[0]
y = var[1]
f = np.cos(x) + np.cos(y)
df_dx = -np.sin(x)
df_dy = -np.sin(y)
return f, (df_dx, df_dy)
def hess(var):
x = var[0]
y = var[1]
f_hess = np.zeros((2,2))
f_hess[0,0] = -np.cos(x)
f_hess[1,1] = -np.cos(y)
return f_hess
min = o.minimize(get_f_df, (0, 0), jac=True, hess=hess, method="newton-cg")
print(min)
The outcome then is:
message: Optimization terminated successfully.
success: True
status: 0
fun: 2.0
x: [ 0.000e+00 0.000e+00]
nit: 1
jac: [-0.000e+00 -0.000e+00]
nfev: 1
njev: 1
nhev: 0
The same outcome occurs if I use hess=None
, hess='cs'
, hess='2-point'
or hess='3-point'
instead of my custom hess function. Also if I use other methods like 'dogleg'
, 'trust-ncg'
, 'trust-krylov'
, 'trust-exact'
or 'trust-constr'
I basically get the same outcome except nhev = 1 but the outcome is still wrong with x = [0,0].
Either I am doing something terribly wrong here (definitely very likely) or there is a major problem with the minimize function, specifically the "newton-cg" method (very unlikely?).
Regarding the latter case I also checked the source code to see if something's wrong there and stumpled upon something kinda weird(?). However, I don't completely understand the whole code, so I am a bit unsure if my worries are legitimate:
When the minimize function is called with method="newton-cg" it jumps into the _minimize_newtoncg function (see source code here). I want to go into detail what I believe happens here:
On line 2168 A = sf.hess(xk)
the hessian is first calculated in dependence of xk
which is at first the start guess x0
. For my test case the hessian is of course
A = [[fxx, fxy], [fxy, fyy]]
with fij being the derivatives of f after i and j. In my case fxy = fyx is also true.
Next on line 2183 Ap = A.dot(psupi)
the product of the hessian A
and psupi
is calculated. psupi
is basically equal to b
which is the negative gradient of f
at xk
. So Ap = A.dot(psupi)
results in
Ap = [fxxfx + fxyfy, fxyfx + fyyfy].
Next, the curvature curv
is calculated on line 2186 by np.dot(psupi, Ap)
. As explained above, psupi
is the negative gradient of f
so this results in
curv
= fxxfx2 + 2 fxyfxfy + fyyfy2.
However, all of these derivatives are at xk
which is equal to the starting parameters x0
at first. If the starting parameters are exactly at a local maximum, the derivatives fx and fy are equal to 0
. Because of this, curv = 0
. This results in a for loop break on the next line, thus, skipping to update xsupi
, psupi
and all the other parameters. Therefore, pk
becomes [0,0]
and _line_search_wolfe12
is called with basically all start parameters. This is where my understanding of the source code stops, however, I feel like things already went wrong after curv = 0
and breaking the for loop.
Since I got a bit feedback and the question arose as to why I don't just use another start guess, I want to give a short explanation what my actual goal is. Maybe it helps you help me.
I want to simulate magnetic hysteresis loops using the macrospin model. What I need to do for that is to find the local minimum in the energy landscape for each external magnetic field step starting from saturation. There, the angle between the magnetic macrospin and the external magnetic field is 0°. In saturation they are in an energetic minimum. If I reduce the external magnetic field, I have to take the angle from the field step before that as a new starting guess. As the external magnetic field is reduced, the local minimum in the energy landscape in saturation transforms into a local maximum.
At first, I ran the local minimization for the angle from the last field value and +- a small increment. My idea was that it should result in the same value, as long as it is not on a local maximum. Then I would take the value which found the lowest minimum. For some reason I do not understand yet, the increment value I chose had a huge impact on the outcome. My increment values were usually in the 0.0001-0.01 range with my angle being in pi values (-3.141 to 3.141). Therefore, I kinda scrapped that idea.
Next, I guess I'll try only checking that if I am indeed on a local maximum and maybe rather consider the gradient and not the final energy value as deciding direction. If that works, I'll update this here.
Update: If I sit on a local maximum or saddle point I now check the gradients at +- increment values and pick the position with the highest gradient as new start guess. This kinda works but the exact increment value again has more influence on the outcame than I'd like. I guess I'll have to try to find an ideal value which works for most conditions.
(The derivative-free solvers suggested by Matt Haberland also did not really seem to help me. They kinda worked but also kinda didn't.)
The solution is stuck at a local maximum because you are using a gradient-based solver with a guess at a point where the gradient is exactly zero. The guess satisfies the condition for successful termination, so the solver stops.
A simple solution is to perturb the guess:
min = o.minimize(get_f_df, (1e-6, 1e-6), jac=True, hess=hess, method="newton-cg")
print(min)
# message: Optimization terminated successfully.
# success: True
# status: 0
# fun: -1.9999999999999998
# x: [ 3.142e+00 3.142e+00]
# nit: 6
# jac: [-1.147e-08 -1.147e-08]
# nfev: 36
# njev: 36
# nhev: 6
Of course, you could try a derivative-free solver. powell
and cobyla
don't use derivatives, and they both solve your problem with the same guess. But the better option is to change the guess.
Either I am doing something terribly wrong here (definitely very likely) or there is a major problem with the minimize function, specifically the "newton-cg" method (very unlikely?).
I believe you've already filed a bug report. Presumably you expected the method to check whether the Hessian is positive definite or not, and it may be surprising that it doesn't require that to report success. Over at the repo, it may be considered a bug or a shortcoming of the algorithm, but you will find that this sort of thing is not unusual for this sort of method, since it is rarely an impediment to solving a problem in practice. In any case, it isn't a problem with the way you've defined the problem - just the guess or choice of method
.