pythonoptimizationconstraintsnon-convexmystic

Minimizing non-convex function with linear constraint and bound in mystic


Say I have a non-convex objective function loss that takes a np.ndarray named X whose shape is (n,) and returns a float number. The objective has many many many local minima, since it's essentially a function of np.round(X * c, 2) where c is another constant array of shape (n,).

You can imagine something like this:

def loss(X: np.ndarray) -> float:
    c = np.array([0.1, 0.5, -0.8, 7.0, 0.0])
    X_rounded = np.round(X * c, 2)
    return rosen(X_rounded)

The linear constraint is expressed with two constant matrices (also stored as numpy's ndarray), A whose shape is (m, n) and b whose shape is (m,). I need to minimize loss with respect to X while keeping A.dot(X) == b. In addition, each element of X must be subject to 0 <= X_i <= 2, and I have a decent initial guess X0 = [1, 1, ..., 1].

I don't need a global minimum, the search can stop as soon as loss(X) <= 1. The objective is mostly written in SQL and thus absurdly slow, so I also want the optimization to terminate when loss has been evaluated ~200 times. (This is not a hard requirement, you can also terminate after the optimization has been running for say 5 minutes.)

With scipy, I can do

rv = minimize(loss,
              initial_guess,
              method='SLSQP',
              bounds=[(0, 2)] * n,
              constraints={
                  'type': 'eq',
                  'fun': lambda x: A.dot(x) - b
              },
              options={
                  'maxiter': 5
              })

I'm not satisfied with this solution because the results are worse than artificial initial guesses (which are sampled around the global minimum as a smoke test), presumable due to the abundance of local minima and some numerical issues? In addition, I cannot estimate the number of objective invocations per iteration (otherwise I can bound the number of innovations by setting maxiter).

How can I do better with mystic, which is presumably more flexible?


Solution

  • Since I don't know what A and b are, I'm going to improvise. So it's not going to be exactly the same as your problem, but should be close enough.

    Let's set up the problem by building the loss function and the constraints. There may be a better way to build the constraint, but the following is pretty general (albeit a bit ugly):

    >>> import mystic as my
    >>> import numpy as np
    >>> from mystic.models import rosen
    >>>
    >>> A = np.array([[9., 0., 0., 8., -1],
    ...               [1., 1., -1., 0., 0.],
    ...               [2., -2., 6., 0., 5.]])
    >>> b = np.array([18., .75, 11.5])
    >>> c = np.array([0.1, 0.5, -0.8, 7.0, 0.0])
    >>>
    >>> def loss(x):
    ...     x_rounded = np.round(x * c, 2)
    ...     return rosen(x_rounded)
    ...
    >>> cons = my.symbolic.linear_symbolic(A, b)
    >>> cons = my.symbolic.solve(cons)
    >>> cons = my.symbolic.generate_constraint(my.symbolic.generate_solvers(cons))
    >>> bounds = [(0,2)] * len(c)
    

    Then try to solve for the global minimum:

    >>> stepmon = my.monitors.VerboseMonitor(1)
    >>> rv = my.solvers.diffev2(loss, x0=bounds, bounds=bounds, constraints=cons, itermon=stepmon, disp=1, npop=20)
    Generation 0 has ChiSquare: 15478.596962
    Generation 1 has ChiSquare: 1833.714503
    Generation 2 has ChiSquare: 1833.714503
    Generation 3 has ChiSquare: 270.601079
    Generation 4 has ChiSquare: 160.690618
    Generation 5 has ChiSquare: 160.690618
    Generation 6 has ChiSquare: 127.289639
    Generation 7 has ChiSquare: 127.289639
    Generation 8 has ChiSquare: 127.289639
    Generation 9 has ChiSquare: 123.054668
    Generation 10 has ChiSquare: 123.054668
    Generation 11 has ChiSquare: 123.054668
    Generation 12 has ChiSquare: 122.561794
    Generation 13 has ChiSquare: 121.069338
    Generation 14 has ChiSquare: 120.828279
    Generation 15 has ChiSquare: 117.732442
    Generation 16 has ChiSquare: 117.732442
    Generation 17 has ChiSquare: 117.340042
    Generation 18 has ChiSquare: 117.340042
    Generation 19 has ChiSquare: 117.340042
    Generation 20 has ChiSquare: 117.340042
    Generation 21 has ChiSquare: 117.340042
    Generation 22 has ChiSquare: 116.750933
    Generation 23 has ChiSquare: 116.750933
    Generation 24 has ChiSquare: 116.750933
    Generation 25 has ChiSquare: 116.750933
    Generation 26 has ChiSquare: 116.750933
    Generation 27 has ChiSquare: 116.750933
    Generation 28 has ChiSquare: 116.750933
    Generation 29 has ChiSquare: 116.750933
    Generation 30 has ChiSquare: 116.750933
    Generation 31 has ChiSquare: 116.750933
    Generation 32 has ChiSquare: 116.750933
    Generation 33 has ChiSquare: 116.750933
    Generation 34 has ChiSquare: 116.750933
    Generation 35 has ChiSquare: 116.750933
    Generation 36 has ChiSquare: 116.750933
    Generation 37 has ChiSquare: 116.750933
    Generation 38 has ChiSquare: 116.750933
    Generation 39 has ChiSquare: 116.750933
    Generation 40 has ChiSquare: 116.750933
    Generation 41 has ChiSquare: 116.750933
    Generation 42 has ChiSquare: 116.750933
    Generation 43 has ChiSquare: 116.750933
    Generation 44 has ChiSquare: 116.750933
    Generation 45 has ChiSquare: 116.750933
    Generation 46 has ChiSquare: 116.750933
    Generation 47 has ChiSquare: 116.750933
    Generation 48 has ChiSquare: 116.750933
    Generation 49 has ChiSquare: 116.750933
    Generation 50 has ChiSquare: 116.750933
    Generation 51 has ChiSquare: 116.750933
    STOP("VTRChangeOverGeneration with {'ftol': 0.005, 'gtol': 1e-06, 'generations': 30, 'target': 0.0}")
    Optimization terminated successfully.
             Current function value: 116.750933
             Iterations: 51
             Function evaluations: 1040
    
    >>> A.dot(rv)
    array([18.  ,  0.75, 11.5 ])
    

    That works (it's probably still not the global minimum)... but it takes some time. So, let's try a faster local solver.

    >>> stepmon = my.monitors.VerboseMonitor(1)
    >>> rv = my.solvers.fmin_powell(loss, x0=[1]*len(c), bounds=bounds, constraints=cons, itermon=stepmon, disp=1)
    Generation 0 has ChiSquare: 244559.856997
    Generation 1 has ChiSquare: 116357.59447400003
    Generation 2 has ChiSquare: 121.23445799999999
    Generation 3 has ChiSquare: 117.635447
    Generation 4 has ChiSquare: 117.59764200000001
    Generation 5 has ChiSquare: 117.59764200000001
    Optimization terminated successfully.
             Current function value: 117.597642
             Iterations: 5
             Function evaluations: 388
    STOP("NormalizedChangeOverGeneration with {'tolerance': 0.0001, 'generations': 2}")
    
    >>> A.dot(rv)
    array([18.  ,  0.75, 11.5 ])
    

    Not bad. You however wanted to limit the number of evaluations of loss, and also to be able to stop if loss is close to the minimum... so let's say stop when loss(x) <= 120. I'll also limit the number of function evaluations to 200.

    >>> stepmon = my.monitors.VerboseMonitor(1)
    >>> rv = my.solvers.fmin_powell(loss, x0=[1]*len(c), bounds=bounds, constraints=cons, itermon=stepmon, disp=1, maxfun=200, gtol=None, ftol=120)
    Generation 0 has ChiSquare: 244559.856997
    Generation 1 has ChiSquare: 116357.59447400003
    Generation 2 has ChiSquare: 121.23445799999999
    Generation 3 has ChiSquare: 117.635447
    Optimization terminated successfully.
             Current function value: 117.635447
             Iterations: 3
             Function evaluations: 175
    STOP("VTRChangeOverGeneration with {'ftol': 120, 'gtol': 1e-06, 'generations': 30, 'target': 0.0}")
    
    >>> A.dot(rv)
    array([18.  ,  0.75, 11.5 ])
    >>> rv
    array([1.93873933, 0.00381084, 1.19255017, 0.0807893 , 0.0949684 ])
    

    There's even more flexibility if you use the class interface to the solvers, but I'll leave that for another time.