I want to solve the following (relaxed, i.e. v(t) ∈ [0, 1]) optimal control problem with cyipopt:
Here's what I have so far to solve the discretized problem:
import numpy as np
import matplotlib.pyplot as plt
from cyipopt import minimize_ipopt
from scipy.optimize._numdiff import approx_derivative
# z = (x1(t0) .... x1(tN) x2(t0) .... x2(tN) v(t0) .... v(tN))^T
def objective(z, time):
x0, x1, v = np.split(z, 3)
res = 0.0
for i in range(time.size-1):
h = time[i+1] - time[i]
res += h*((x0[i]-1)**2 + (x1[i]-1)**2)
return res
def ode_rhs(t, x, v):
x0, x1 = x
xdot1 = x0 - x0*x1 - 0.4*x0*v
xdot2 = -x1 + x0*x1 - 0.2*x1*v
return np.array([xdot1, xdot2])
def constraint(z, time):
x0, x1, v = np.split(z, 3)
x = np.array([x0, x1])
res = np.zeros((2, x0.size))
# initial values
res[:, 0] = x[:, 0] - np.array([0.5, 0.7])
# 'solve' the ode-system
for j in range(time.size-1):
h = time[j+1] - time[j]
# implicite euler scheme
res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])
return res.flatten()
# time grid
tspan = [0, 12]
dt = 0.1
time = np.arange(tspan[0], tspan[1] + dt, dt)
# initial point
z0 = 0.1 + np.zeros(time.size*3)
# variable bounds
bnds = [(None, None) if i < 2*time.size else (0, 1) for i in range(z0.size)]
# constraints:
cons = [{
'type': 'eq',
'fun': lambda z: constraint(z, time),
'jac': lambda z: approx_derivative(lambda zz: constraint(zz, time), z)
}]
# call the solver
res = minimize_ipopt(lambda z: objective(z, time), x0=z0, bounds=bnds,
constraints=cons, options = {'disp': 5})
The code works as expected. However, it runs quite slow. Any ideas on how I can speed up the solver?
By analyzing Ipopt's output
Total CPU secs in IPOPT (w/o function evaluations) = 30.153
Total CPU secs in NLP function evaluations = 203.782
we can see that the evaluation of your functions is the bottleneck. So let's try to profile your code as Tom suggested in the comments:
In [2]: %timeit objective(z0, time)
307 µs ± 6.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: %timeit constraint(z0, time)
1.38 ms ± 4.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Okay, not bad. But we can do better. As a rule of thumb, try to prevent loops in numerical python code whenever possible. You can find some best numpy practices e.g. in Jake VanderPlas awesome talk at the PyCon2015. Your objective is equivalent to:
def objective(z, time):
x0, x1, v = np.split(z, 3)
h = time[1:] - time[:-1]
return np.sum(h*((x0[1:]-1)**2 + (x1[1:]-1)**2))
Similarly, you can remove the loop inside your constraint
function. Note that
# 'solve' the ode-system
for j in range(time.size-1):
h = time[j+1] - time[j]
# implicite euler scheme
res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])
is the same as
h = time[1:] - time[:-1]
res[:, 1:] = x[:, 1:] - x[:, :-1] - h * ode_rhs(time, x[:, 1:], v[:-1])
Timing the functions again, we get
In [4]: %timeit objective(z0, time)
31.8 µs ± 683 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: %timeit constraint(z0, time)
54.1 µs ± 647 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
i.e. speedups with factors 10x and 25x! Consequently, we can significantly reduce the solver runtime:
Total CPU secs in IPOPT (w/o function evaluations) = 30.906
Total CPU secs in NLP function evaluations = 46.950
However, note that calculating the gradient and jacobian numerically by finite differences is still computationally expensive and prone to rounding errors:
In [6]: %timeit approx_derivative(lambda zz: objective(zz, time), z0)
232 ms ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit approx_derivative(lambda zz: constraint(zz, time), z0)
642 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Instead, we can go one step further and calculate both via algorithmic differentiation (AD) by means of the jax library:
from jax.config import config
# enable 64 bit floating point precision
config.update("jax_enable_x64", True)
import jax.numpy as np
from jax import grad, jacfwd, jit
Then, we only need to change the constraint
function as follows:
def constraint(z, time):
x0, x1, v = np.split(z, 3)
x = np.array([x0, x1])
res = np.zeros((2, x0.size))
# initial values
res = res.at[:, 0].set(x[:, 0] - np.array([0.5, 0.7]))
h = time[1:] - time[:-1]
res = res.at[:, 1:].set(x[:, 1:] - x[:, :-1] - h*ode_jit(time[1:], x[:, 1:], v[:-1]))
return res.flatten()
since item assignments are not supported, see here. Next, we just-in-time (jit) compile the functions:
# jit the functions
ode_jit = jit(ode_rhs)
obj_jit = jit(lambda z: objective(z, time))
con_jit = jit(lambda z: constraint(z, time))
# Build and jit the derivatives
obj_grad = jit(grad(obj_jit)) # objective gradient
con_jac = jit(jacfwd(con_jit)) # constraint jacobian
# Dummy first call in order to compile the functions
print("Compiling the functions...")
_ = obj_jit(z0), con_jit(z0), obj_grad(z0), con_jac(z0)
print("Done.")
Timing again, we obtain
In [10]: %timeit obj_grad(z0)
62.1 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [11]: %timeit con_jac(z0)
204 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
i.e. speedups with factors 3740x and 2260x. Finally, we can pass the exact gradient and jacobians:
# constraints:
cons = [{'type': 'eq', 'fun': con_jit, 'jac': con_jac}]
# call the solver
res = minimize_ipopt(obj_jit, x0=z0, jac=obj_grad, bounds=bnds,
constraints=cons, options={'disp': 5})
and obtain
Total CPU secs in IPOPT (w/o function evaluations) = 35.348
Total CPU secs in NLP function evaluations = 1.691