Say I have a non-convex objective function loss
that takes a np.ndarray
named X
whose shape is (n,) and returns a float
number. The objective has many many many local minima, since it's essentially a function of np.round(X * c, 2)
where c
is another constant array of shape (n,).
You can imagine something like this:
def loss(X: np.ndarray) -> float:
c = np.array([0.1, 0.5, -0.8, 7.0, 0.0])
X_rounded = np.round(X * c, 2)
return rosen(X_rounded)
The linear constraint is expressed with two constant matrices (also stored as numpy's ndarray
), A
whose shape is (m, n) and b
whose shape is (m,). I need to minimize loss
with respect to X
while keeping A.dot(X) == b
. In addition, each element of X
must be subject to 0 <= X_i <= 2
, and I have a decent initial guess X0 = [1, 1, ..., 1]
.
I don't need a global minimum, the search can stop as soon as loss(X) <= 1
. The objective is mostly written in SQL and thus absurdly slow, so I also want the optimization to terminate when loss
has been evaluated ~200 times. (This is not a hard requirement, you can also terminate after the optimization has been running for say 5 minutes.)
With scipy, I can do
rv = minimize(loss,
initial_guess,
method='SLSQP',
bounds=[(0, 2)] * n,
constraints={
'type': 'eq',
'fun': lambda x: A.dot(x) - b
},
options={
'maxiter': 5
})
I'm not satisfied with this solution because the results are worse than artificial initial guesses (which are sampled around the global minimum as a smoke test), presumable due to the abundance of local minima and some numerical issues? In addition, I cannot estimate the number of objective invocations per iteration (otherwise I can bound the number of innovations by setting maxiter
).
How can I do better with mystic
, which is presumably more flexible?
Since I don't know what A
and b
are, I'm going to improvise. So it's not going to be exactly the same as your problem, but should be close enough.
Let's set up the problem by building the loss function and the constraints. There may be a better way to build the constraint, but the following is pretty general (albeit a bit ugly):
>>> import mystic as my
>>> import numpy as np
>>> from mystic.models import rosen
>>>
>>> A = np.array([[9., 0., 0., 8., -1],
... [1., 1., -1., 0., 0.],
... [2., -2., 6., 0., 5.]])
>>> b = np.array([18., .75, 11.5])
>>> c = np.array([0.1, 0.5, -0.8, 7.0, 0.0])
>>>
>>> def loss(x):
... x_rounded = np.round(x * c, 2)
... return rosen(x_rounded)
...
>>> cons = my.symbolic.linear_symbolic(A, b)
>>> cons = my.symbolic.solve(cons)
>>> cons = my.symbolic.generate_constraint(my.symbolic.generate_solvers(cons))
>>> bounds = [(0,2)] * len(c)
Then try to solve for the global minimum:
>>> stepmon = my.monitors.VerboseMonitor(1)
>>> rv = my.solvers.diffev2(loss, x0=bounds, bounds=bounds, constraints=cons, itermon=stepmon, disp=1, npop=20)
Generation 0 has ChiSquare: 15478.596962
Generation 1 has ChiSquare: 1833.714503
Generation 2 has ChiSquare: 1833.714503
Generation 3 has ChiSquare: 270.601079
Generation 4 has ChiSquare: 160.690618
Generation 5 has ChiSquare: 160.690618
Generation 6 has ChiSquare: 127.289639
Generation 7 has ChiSquare: 127.289639
Generation 8 has ChiSquare: 127.289639
Generation 9 has ChiSquare: 123.054668
Generation 10 has ChiSquare: 123.054668
Generation 11 has ChiSquare: 123.054668
Generation 12 has ChiSquare: 122.561794
Generation 13 has ChiSquare: 121.069338
Generation 14 has ChiSquare: 120.828279
Generation 15 has ChiSquare: 117.732442
Generation 16 has ChiSquare: 117.732442
Generation 17 has ChiSquare: 117.340042
Generation 18 has ChiSquare: 117.340042
Generation 19 has ChiSquare: 117.340042
Generation 20 has ChiSquare: 117.340042
Generation 21 has ChiSquare: 117.340042
Generation 22 has ChiSquare: 116.750933
Generation 23 has ChiSquare: 116.750933
Generation 24 has ChiSquare: 116.750933
Generation 25 has ChiSquare: 116.750933
Generation 26 has ChiSquare: 116.750933
Generation 27 has ChiSquare: 116.750933
Generation 28 has ChiSquare: 116.750933
Generation 29 has ChiSquare: 116.750933
Generation 30 has ChiSquare: 116.750933
Generation 31 has ChiSquare: 116.750933
Generation 32 has ChiSquare: 116.750933
Generation 33 has ChiSquare: 116.750933
Generation 34 has ChiSquare: 116.750933
Generation 35 has ChiSquare: 116.750933
Generation 36 has ChiSquare: 116.750933
Generation 37 has ChiSquare: 116.750933
Generation 38 has ChiSquare: 116.750933
Generation 39 has ChiSquare: 116.750933
Generation 40 has ChiSquare: 116.750933
Generation 41 has ChiSquare: 116.750933
Generation 42 has ChiSquare: 116.750933
Generation 43 has ChiSquare: 116.750933
Generation 44 has ChiSquare: 116.750933
Generation 45 has ChiSquare: 116.750933
Generation 46 has ChiSquare: 116.750933
Generation 47 has ChiSquare: 116.750933
Generation 48 has ChiSquare: 116.750933
Generation 49 has ChiSquare: 116.750933
Generation 50 has ChiSquare: 116.750933
Generation 51 has ChiSquare: 116.750933
STOP("VTRChangeOverGeneration with {'ftol': 0.005, 'gtol': 1e-06, 'generations': 30, 'target': 0.0}")
Optimization terminated successfully.
Current function value: 116.750933
Iterations: 51
Function evaluations: 1040
>>> A.dot(rv)
array([18. , 0.75, 11.5 ])
That works (it's probably still not the global minimum)... but it takes some time. So, let's try a faster local solver.
>>> stepmon = my.monitors.VerboseMonitor(1)
>>> rv = my.solvers.fmin_powell(loss, x0=[1]*len(c), bounds=bounds, constraints=cons, itermon=stepmon, disp=1)
Generation 0 has ChiSquare: 244559.856997
Generation 1 has ChiSquare: 116357.59447400003
Generation 2 has ChiSquare: 121.23445799999999
Generation 3 has ChiSquare: 117.635447
Generation 4 has ChiSquare: 117.59764200000001
Generation 5 has ChiSquare: 117.59764200000001
Optimization terminated successfully.
Current function value: 117.597642
Iterations: 5
Function evaluations: 388
STOP("NormalizedChangeOverGeneration with {'tolerance': 0.0001, 'generations': 2}")
>>> A.dot(rv)
array([18. , 0.75, 11.5 ])
Not bad. You however wanted to limit the number of evaluations of loss
, and also to be able to stop if loss
is close to the minimum... so let's say stop when loss(x) <= 120
. I'll also limit the number of function evaluations to 200
.
>>> stepmon = my.monitors.VerboseMonitor(1)
>>> rv = my.solvers.fmin_powell(loss, x0=[1]*len(c), bounds=bounds, constraints=cons, itermon=stepmon, disp=1, maxfun=200, gtol=None, ftol=120)
Generation 0 has ChiSquare: 244559.856997
Generation 1 has ChiSquare: 116357.59447400003
Generation 2 has ChiSquare: 121.23445799999999
Generation 3 has ChiSquare: 117.635447
Optimization terminated successfully.
Current function value: 117.635447
Iterations: 3
Function evaluations: 175
STOP("VTRChangeOverGeneration with {'ftol': 120, 'gtol': 1e-06, 'generations': 30, 'target': 0.0}")
>>> A.dot(rv)
array([18. , 0.75, 11.5 ])
>>> rv
array([1.93873933, 0.00381084, 1.19255017, 0.0807893 , 0.0949684 ])
There's even more flexibility if you use the class interface to the solvers, but I'll leave that for another time.