pythonpandasdaskdask-distributeddask-dataframe

dask import errors, dataframe/client - version conflicts with pandas?


Not all versions of dask.dataframe and pandas are compatible. This has been already addressed in this question

I have tried several combinations but in combinations with more recent dask versions, where I get dask.dataframe working (e.g. dask 2023.2.0 and pandas 2.1.2 in Python 3.10.12), I run into problems with the import of the Client:

from distributed import Client

  File "/mypath/progs/myprog.py", line 16, in <module> 
    from distributed import Client
  File "/usr/lib/python3/dist-packages/distributed/__init__.py", line 23, in <module> 
    from .deploy import Adaptive, LocalCluster, SpecCluster, SSHCluster 
  File "/usr/lib/python3/dist-packages/distributed/deploy/__init__.py", line 5, in <module> 
    from .local import LocalCluster 
  File "/usr/lib/python3/dist-packages/distributed/deploy/local.py", line 15, in <module> 
    from .utils import nprocesses_nthreads 
  File "/usr/lib/python3/dist-packages/distributed/deploy/utils.py", line 4, in <module> 
    from dask.utils import factors 
ImportError: cannot import name 'factors' from 'dask.utils' (/mypath/.local/lib/python3.10/site-packages/dask/utils.py)

I don't really think this is still related to pandas but who knows...

Does anybody have an idea what is going on here and how to be able to import both dask.dataframe and the Client?


Solution

  • If possible, update the distributed to the same version as dask (they are released at about the same time).