[SOLVED] How are the `bound` values in `scipy.optimize.minimize` distributed?

How are the `bound` values in `scipy.optimize.minimize` distributed?

I have a conceptual question. How are the bound values in

scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)

is distributed.

Are they sequenced linearly? What is the distribution of the parameter values? If the distribution is non-linear then how are the candidate values are searched?

Solution

What is the distribution of the parameter values?

The scipy.optimize.minimize() makes no assumptions about what distribution your parameter values come from. It doesn't need to make this assumption, because it is not a global optimizer.

Let's compare and contrast some other optimizers.

The scipy.optimize.brute() optimizer searches for a function minimum by "computing the function’s value at each point of a multidimensional grid of points, to find the global minimum of the function." This creates a bunch of linearly spaced points, and searches every combination of points for every variable. This is a global optimizer - it is searching the entire search space. The cost of this is that it is spending a lot of time searching regions that aren't anywhere near the minimum. From a probability perspective, you could think of this as drawing the parameters to search from a uniform distribution. (Although, this is a slightly weird way to describe it, as brute() is completely deterministic.)

The scipy.optimize.dual_annealing() optimizer searches for a function minimum by an algorithm which is similar to stochastic annealing. This is also a global optimizer, but focuses on the area around the accepted solution. It draws new parameters to search from a Cauchy-Lorentz visiting distribution centered on the current accepted solution.

With these examples in mind, let's talk about minimize() again.

If the distribution is non-linear then how are the candidate values are searched?

In contrast to those global optimizers, minimize() is a local optimizer. It supports a variety of methods, but in general, those methods are doing the following:

Start with an initial guess x0.
Find the function gradient at this point, by numerically differentiating the function.
Determine which direction, if any, is downhill.
Move in that direction. This is the new x.
Return to step 2, and repeat until no meaningful improvement can be made.

This is making some assumptions, but they're not assumptions about parameter distribution. Specifically:

It assumes that local minimums don't exist or don't matter. If there is a global minimum, but minimize() would need to go uphill to find it before going downhill again, then it is not smart enough to do that.
It assumes that the function you're optimizing is differentiable.

The advantage of it is that it usually takes fewer evaluations than a global optimizer.