I'm trying to implement dask on a cluster that uses SLURM. The client is successfully created and scaled, however, at the line
with joblib.parallel_backend('dask'):
the operation gets the worker timeout error and I get the following error from the slurm jobs:
/usr/bin/python3: Error while finding module specification for 'distributed.cli.dask_worker' (ModuleNotFoundError: No module named 'distributed')
I have checked to make sure that distributed has been installed on the cluster's nodes and I am able to import it into python without any issues. Does anyone know why distributed is causing issues?
Making a fresh conda environment with dask[complete] seems to have worked.