I have a function of 3 variables that I want to optimize. Unfortunately, the variables have different orders of magnitude, meaning that the problem is very ill-conditioned. I'm handling this by multiplying the optimization variable by a conditioning matrix. (In the example below, my conditioning matrix is diagonal, so I've implemented it as a vector. There may be applications for non-diagonal conditioning matrices.)
import numpy as np
from scipy.optimize import minimize
def penalty(x):
# do stuff
return # result
conditioner = np.array([1000, 1, 0.1])
x0 = np.array([0.003, 1.415, 9.265]) * conditioner
optim_result = minimize(penalty, x0)
bestfit = optim_result.x / conditioner
Is there a way to do this that doesn't require the manual application of conditioner
or require me to keep track of which data are conditioned (e.g., bestfit
) and which are not (e.g., optim_result
)?
I suppose I could put some of the arithmetic inside the penalty function, like this:
def penalty(x):
x_cond = x * conditioner
# do stuff
return # result
but it only solves half of the problem.
A simple way of doing that would be to write a wrapper for minimize()
which handles the task of converting into and out of this new coordinate system.
At minimum, this wrapper needs to do three things:
That can be done like this:
import scipy
def minimize_with_parameter_scaling(f, x0, typical_x_value=None, *args, **kwargs):
def wrapped_f(x, *args, **kwargs):
return f(x * typical_x_value, *args, **kwargs)
x0 = np.asarray(x0)
if typical_x_value is None:
typical_x_value = np.ones_like(x0)
typical_x_value = np.asarray(typical_x_value)
assert (typical_x_value != 0).all(), "parameter scale cannot be zero"
x0_scaled = x0 / typical_x_value
result = scipy.optimize.minimize(wrapped_f, x0_scaled, *args, **kwargs)
result.x = result.x * typical_x_value
return result
Warning: typical_x_value
has the inverse meaning of conditioner
in your example, and instead represents how large a typical x value is.
This function could be called like this:
x0 = [0, 0, 0]
scale = [1, 0.01, 1]
result = minimize_with_parameter_scaling(wacky_rosenbrock, x0, scale)
Let's also try this out on an example problem, to see if the hypothesis that conditioning will help is correct.
The example problem I chose is the rosenbrock function, which is a function designed to test nonlinear optimizers. It is a nonlinear function with a global minimum, which has derivatives which are misleading to optimizers, which is why it is a hard problem.
To provide a problem with inappropriate scale, I made one of the coordinates have a different scale than the others.
def wacky_rosenbrock(x):
# Code copied from scipy.optimize.rosen
x = np.asarray(x)
x[1] *= 100 # Second coordinate is multiplied by 100
r = np.sum(100.0 * (x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0,
axis=0)
return r
I then attempted to solve wacky_rosenbrock()
in three dimensions. I tried solving it without scaling, and with a scaling of [1, 0.01, 1]
. The nfev are the number of function calls it took to get the solution. The error is the percentage error of the solution the optimizer found versus the actual best solution. The last column is how many times more function calls it takes without scaling. I also tried a variety of SciPy's solvers.
method | error (no scaling) | nfev (no scaling) | error (scaling) | nfev (scaling) | improvement in function call count |
---|---|---|---|---|---|
nelder-mead | 0.001% | 305 | 0.003% | 289 | 1.05536 |
powell | 0.0% | 862 | 0.0% | 886 | 0.972912 |
cg | 0.239% | 1772 | 0.0% | 248 | 7.14516 |
bfgs | 0.035% | 340 | 0.001% | 148 | 2.2973 |
l-bfgs-b | 0.023% | 268 | 0.0% | 140 | 1.91429 |
tnc | 37.086% | 404 | 0.015% | 400 | 1.01 |
cobyla | 98.729% | 39 | 11.214% | 4087 | 0.00954245 |
cobyqa | 0.008% | 1406 | 0.0% | 157 | 8.95541 |
slsqp | 0.042% | 128 | 0.049% | 112 | 1.14286 |
trust-constr | 0.038% | 264 | 0.001% | 232 | 1.13793 |
In these results, I want to point out the following things: