I'm trying to parallelize the task of finding minimum cost paths through a raster cost surface, but I keep bumping into the same PicklingError: Could not pickle the task to send it to the workers.
This is a code example of what's going on:
import numpy as np
from skimage.graph import MCP_Geometric
import timeit
from joblib import Parallel, delayed
np.random.seed(123)
cost_surface = np.random.rand(1000, 1000)
mcp = MCP_Geometric(cost_surface)
pois = [(np.random.randint(0, 1000), np.random.randint(0, 1000)) for _ in range(20)]
def task(poi):
costs_array, traceback = mcp.find_costs(starts=[poi], ends=pois)
ends_idx = tuple(np.asarray(pois).T.tolist())
costs = costs_array[ends_idx]
tracebacks = [mcp.traceback(end) for end in pois]
Parallel(n_jobs=6)(delayed(task)(poi) for poi in pois)
I'm fairly new to parallelizing tasks, but the code I'm running might take weeks if done sequentially and want to leverage parallel. I understand that pickling is not possible for some complex objects, so I'm also looking for alternatives.
This is because MCP_Geometric is unpickable. You need to move initialization of this class into task
function:
import numpy as np
from skimage.graph import MCP_Geometric
import timeit
from joblib import Parallel, delayed
np.random.seed(123)
cost_surface = np.random.rand(1000, 1000)
pois = [(np.random.randint(0, 1000), np.random.randint(0, 1000)) for _ in range(20)]
def task(poi):
# --- each process will have it's own version of the `mcp` class
mcp = MCP_Geometric(cost_surface)
costs_array, traceback = mcp.find_costs(starts=[poi], ends=pois)
ends_idx = tuple(np.asarray(pois).T.tolist())
costs = costs_array[ends_idx]
tracebacks = [mcp.traceback(end) for end in pois]
Parallel(n_jobs=6)(delayed(task)(poi) for poi in pois)