scipy's differential_evolution appears slower when used inside class

I have a curious case. Unfortunately, I did not manage to reproduce the computation times in the MWE below but my project is basically a more crowded version. I wanted to speedup my code so I tried out differential_evolution with worker=-1. This did not work because of pickling issues (AttributeError: Can't pickle local object); I don't really want to define my cost-function at the top level of my module. So I used worker=Pool(ncpu).map instead. In my local MWE this works, but when migrating this change to my project I observe the following.

I observe that using two different approaches I find that differential_evolution takes more then 10 times as much time to finish in the second (class-based) approach then the first approach. When I look into my CPU load for approach 1 I see that all cores are used. However in approach 2 basically only one core is being used. When I look at the output via disp=True I also see that the func evaluations are printed way slower as well in the 2nd approach compared to the 1st.

Any idea what that might be? Tx.

from os import cpu_count
from time import time
import numpy as np
from pathos.multiprocessing import ProcessingPool as Pool
from scipy.optimize import differential_evolution
from scipy.integrate import quad
from scipy.interpolate import CubicSpline

atol=1e-13
rtol=1e-10
T = 10
bounds = [(-T,T)]

# --- Approach 1 --- #

x = np.linspace(0,10*2*np.pi,100)
y = np.cos(x)
cs = CubicSpline(x, y)
def f(t):
    return cs(t)**2

ctime = time()
def cost(t):
    t = t[0]
    q = 0
    for _ in range(50):
        q += quad(lambda s: f(s*t), -T, T)[0]
    return q
result = differential_evolution(cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=Pool(cpu_count()).map)
print("Approach 1:", time()-ctime)

# --- Approach 2 --- #

class A():
    def define_f(self):
        x = np.linspace(0,10*2*np.pi,100)
        y = np.cos(x)
        cs = CubicSpline(x, y)
        def f(t):
            return cs(t)**2
        return f

f = A().define_f()

class B():
    def solve(self, f):
        def cost(t):
            t = t[0]
            q = 0
            for _ in range(50):
                q += quad(lambda s: f(s*t), -T, T)[0]
            return q
        
        ctime = time()
        result = differential_evolution(cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=Pool(cpu_count()).map)
        print("Approach 2:", time()-ctime)

B().solve(f=f)

Produces something like:

Approach 1: 4.710453510284424
Approach 2: 5.638171672821045

However in my project the second timestamp is an order of magnitude larger.

Below is a vectorized version of the code I'm actually using without the provided pool.map: y0,y1 are solutions obtained from solving the variational equation of a 2-dimension nonlinear system for periodic solutions via scipy.integrat.solve_ivp. The first step in cost of reshaping the t array is necessary since scipy's solution object requires the shape to be (n,).

# Define fundamental solution.
def W(t: np.ndarray) -> np.ndarray:
    if type(t) is np.ndarray: return np.stack([y0.sol(t).T,  y1.sol(t).T], axis=-1).swapaxes(1,2)
    else: return np.array([y0.sol(t),  y1.sol(t)])

# Define monodromy matrix and related quantities.
C = W(period)
C1 = np.eye(C.shape[0]) - C
C2 = np.linalg.solve(C, C1)

# Define cost function.
def cost(t):
    if len(t.shape) > 1:
        t = np.reshape(t, (t.shape[1],))
    B = W(t)
    def H(s: float, t: np.ndarray=t, B: np.ndarray=B):
        mask_s_le_t = s <= t
        W_s = W(s)
        X0 = np.linalg.solve(C1 @ W_s, B)
        X1 = np.linalg.solve(C2 @ W_s, B)
        return np.where(mask_s_le_t[:, None, None], X0, X1)
    return -quad_vec(lambda s: np.sum(H(s)**2, axis=(1,2)), 0, period)[0]

# Maximize.
maxi = -minimize(cost, bounds=[(0,period)], atol=atol, tol=rtol, updating='deferred', vectorized=1).fun

Solution

I have a curious case. Unfortunately, I did not manage to reproduce the computation times in the MWE below but my project is basically a more crowded version. I wanted to speedup my code so I tried out differential_evolution with worker=-1. This did not work because of pickling issues (AttributeError: Can't pickle local object); I don't really want to define my cost-function at the top level of my module. So I used worker=Pool(ncpu).map instead.

I have personally found that the multiprocessing support built into the standard library works better than pathos. I would suggest solving this problem by making cost() picklable.

You mention that you don't want to define your cost function at the module top level, but this is not the only way to approach the problem. If you have state that you want to pass along within a function that you want to pickle, you have three options for how to solve this: functools.partial(), SciPy args, and defining methods on a class. (Technically, you can also do this by defining the variable as a global, as you do in Approach #1, but as this is not portable, I don't include it.)

Of these three approaches, defining methods on a class can make this picklable without defining your cost function at top level.

Example:

from os import cpu_count
from time import time
import numpy as np
from multiprocessing import Pool
from scipy.optimize import differential_evolution
from scipy.integrate import quad
from scipy.interpolate import CubicSpline

atol=1e-13
rtol=1e-10
T = 10
bounds = [(-T,T)]


# --- Approach 3 --- #

class A():
    def f(self, t):
        return self.cs(t)**2
    def define_f(self):
        x = np.linspace(0,10*2*np.pi,100)
        y = np.cos(x)
        cs = CubicSpline(x, y)
        self.cs = cs
        return self.f


class B():
    def cost(self, t):
        t = t[0]
        q = 0
        for _ in range(50):
            q += quad(lambda s: self.f(s*t), -T, T)[0]
        return q
    def solve(self, f):
        ctime = time()
        self.f = f
        with Pool(cpu_count()) as pool:
            result = differential_evolution(self.cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=pool.map)
        print("Approach 3:", time()-ctime)

if __name__ == '__main__':
    f = A().define_f()
    B().solve(f=f)

On my system, this is actually faster than approach 1:

Approach 1: 11.039157629013062
Approach 2: 11.278709650039673
Approach 3: 10.805178165435791

Note that workers=multiprocessing.Pool(cpu_count()) is roughly equivalent to workers=-1, since differential_evolution() uses the built in multiprocessing support by default.