pythonnumpymathematical-optimizationipoptcyipopt

How to speed-up the Ipopt solver?


I want to solve the following (relaxed, i.e. v(t) ∈ [0, 1]) optimal control problem with cyipopt:

enter image description here

Here's what I have so far to solve the discretized problem:

import numpy as np
import matplotlib.pyplot as plt
from cyipopt import minimize_ipopt
from scipy.optimize._numdiff import approx_derivative

# z = (x1(t0) .... x1(tN) x2(t0) .... x2(tN) v(t0) .... v(tN))^T

def objective(z, time):
    x0, x1, v = np.split(z, 3)
    res = 0.0
    for i in range(time.size-1):
        h = time[i+1] - time[i]
        res += h*((x0[i]-1)**2 + (x1[i]-1)**2)
    return res


def ode_rhs(t, x, v):
    x0, x1 = x
    xdot1 = x0 - x0*x1 - 0.4*x0*v
    xdot2 = -x1 + x0*x1 - 0.2*x1*v
    return np.array([xdot1, xdot2])


def constraint(z, time):
    x0, x1, v = np.split(z, 3)
    x = np.array([x0, x1])
    res = np.zeros((2, x0.size))

    # initial values
    res[:, 0] = x[:, 0] - np.array([0.5, 0.7])

    # 'solve' the ode-system
    for j in range(time.size-1):
        h = time[j+1] - time[j]
        # implicite euler scheme
        res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])
    return res.flatten()


# time grid
tspan = [0, 12]
dt    = 0.1
time  = np.arange(tspan[0], tspan[1] + dt, dt)

# initial point
z0 = 0.1 + np.zeros(time.size*3)

# variable bounds
bnds = [(None, None) if i < 2*time.size else (0, 1) for i in range(z0.size)]

# constraints:
cons = [{
    'type': 'eq', 
    'fun': lambda z: constraint(z, time), 
    'jac': lambda z: approx_derivative(lambda zz: constraint(zz, time), z)
}]

# call the solver
res = minimize_ipopt(lambda z: objective(z, time), x0=z0, bounds=bnds, 
                     constraints=cons, options = {'disp': 5})

The code works as expected. However, it runs quite slow. Any ideas on how I can speed up the solver?


Solution

  • By analyzing Ipopt's output

    Total CPU secs in IPOPT (w/o function evaluations)   =     30.153
    Total CPU secs in NLP function evaluations           =    203.782
    

    we can see that the evaluation of your functions is the bottleneck. So let's try to profile your code as Tom suggested in the comments:

    In [2]: %timeit objective(z0, time)
    307 µs ± 6.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [3]: %timeit constraint(z0, time)
    1.38 ms ± 4.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    Okay, not bad. But we can do better. As a rule of thumb, try to prevent loops in numerical python code whenever possible. You can find some best numpy practices e.g. in Jake VanderPlas awesome talk at the PyCon2015. Your objective is equivalent to:

    def objective(z, time):
        x0, x1, v = np.split(z, 3)
        h = time[1:] - time[:-1]
        return np.sum(h*((x0[1:]-1)**2 + (x1[1:]-1)**2))
    

    Similarly, you can remove the loop inside your constraint function. Note that

    # 'solve' the ode-system
    for j in range(time.size-1):
        h = time[j+1] - time[j]
        # implicite euler scheme
        res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])
    

    is the same as

    h = time[1:] - time[:-1]
    res[:, 1:] = x[:, 1:] - x[:, :-1] - h * ode_rhs(time, x[:, 1:], v[:-1])
    

    Timing the functions again, we get

    In [4]: %timeit objective(z0, time)
    31.8 µs ± 683 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    In [5]: %timeit constraint(z0, time)
    54.1 µs ± 647 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    i.e. speedups with factors 10x and 25x! Consequently, we can significantly reduce the solver runtime:

    Total CPU secs in IPOPT (w/o function evaluations)   =     30.906
    Total CPU secs in NLP function evaluations           =     46.950
    

    However, note that calculating the gradient and jacobian numerically by finite differences is still computationally expensive and prone to rounding errors:

    In [6]: %timeit approx_derivative(lambda zz: objective(zz, time), z0)
    232 ms ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [7]: %timeit approx_derivative(lambda zz: constraint(zz, time), z0)
    642 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    Instead, we can go one step further and calculate both via algorithmic differentiation (AD) by means of the jax library:

    from jax.config import config
    # enable 64 bit floating point precision
    config.update("jax_enable_x64", True)
    
    import jax.numpy as np
    from jax import grad, jacfwd, jit
    

    Then, we only need to change the constraint function as follows:

    def constraint(z, time):
        x0, x1, v = np.split(z, 3)
        x = np.array([x0, x1])
        res = np.zeros((2, x0.size))
    
        # initial values
        res = res.at[:, 0].set(x[:, 0] - np.array([0.5, 0.7]))
        h = time[1:] - time[:-1]
        res = res.at[:, 1:].set(x[:, 1:] - x[:, :-1] - h*ode_jit(time[1:], x[:, 1:], v[:-1]))
        return res.flatten()
    

    since item assignments are not supported, see here. Next, we just-in-time (jit) compile the functions:

    # jit the functions
    ode_jit = jit(ode_rhs)
    obj_jit = jit(lambda z: objective(z, time))
    con_jit = jit(lambda z: constraint(z, time))
    
    # Build and jit the derivatives
    obj_grad = jit(grad(obj_jit))   # objective gradient
    con_jac = jit(jacfwd(con_jit))  # constraint jacobian
    
    # Dummy first call in order to compile the functions
    print("Compiling the functions...")
    _ = obj_jit(z0), con_jit(z0), obj_grad(z0), con_jac(z0)
    print("Done.")
    

    Timing again, we obtain

    In [10]: %timeit obj_grad(z0)
    62.1 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    In [11]: %timeit con_jac(z0)
    204 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    i.e. speedups with factors 3740x and 2260x. Finally, we can pass the exact gradient and jacobians:

    # constraints:
    cons = [{'type': 'eq', 'fun': con_jit, 'jac': con_jac}]
    
    # call the solver
    res = minimize_ipopt(obj_jit, x0=z0, jac=obj_grad, bounds=bnds,
                             constraints=cons, options={'disp': 5})
    

    and obtain

    Total CPU secs in IPOPT (w/o function evaluations)   =     35.348
    Total CPU secs in NLP function evaluations           =      1.691