I'm attempting to deploy a dask application on Kubernetes/Azure. I have a Flask application server that is the client of a Dask scheduler/workers.
I installed the Dask operator as described here:
helm install --repo https://helm.dask.org --create-namespace -n dask-operator --generate-name dask-kubernetes-operator
This created the scheduler and worker pods, I have them running on Kubernetes without errors.
For the Flask application, I have a Docker image with the following Dockerfile:
FROM daskdev/dask
RUN apt-get -y install python3-pip
RUN pip3 install flask
RUN pip3 install gunicorn
RUN pip3 install "dask[complete]"
RUN pip3 install "dask[distributed]" --upgrade
RUN pip3 install "dask-ml[complete]"
Whenever I try to run a function in the workers using the Client
interface, I get this error in the scheduler pod:
TypeError: update_graph() got an unexpected keyword argument 'graph_header'
It seems to me that the Dask image used to run Flask and the Dask Kubernetes that I installed are not compatible or aligned?
How to create an image that includes Dask for the Flask server that can be integrated with the Dask Kubernetes package?
I run in Flask client.get_versions(check=True)
and this is what I get:
{'scheduler': {'host': {'python': '3.8.15.final.0', 'python-bits': 64, 'OS': 'Linux', 'OS-release': '5.4.0-1105-azure', 'machine': 'x86_64', 'processor': 'x86_64', 'byteorder': 'little', 'LC_ALL': 'C.UTF-8', 'LANG': 'C.UTF-8'}, 'packages': {'python': '3.8.15.final.0', 'dask': '2023.1.0', 'distributed': '2023.1.0', 'msgpack': '1.0.4', 'cloudpickle': '2.2.0', 'tornado': '6.2', 'toolz': '0.12.0', 'numpy': '1.24.1', 'pandas': '1.5.2', 'lz4': '4.2.0'}}, 'workers': {'tcp://10.244.0.3:40749': {'host': {'python': '3.8.15.final.0', 'python-bits': 64, 'OS': 'Linux', 'OS-release': '5.4.0-1105-azure', 'machine': 'x86_64', 'processor': 'x86_64', 'byteorder': 'little', 'LC_ALL': 'C.UTF-8', 'LANG': 'C.UTF-8'}, 'packages': {'python': '3.8.15.final.0', 'dask': '2023.1.0', 'distributed': '2023.1.0', 'msgpack': '1.0.4', 'cloudpickle': '2.2.0', 'tornado': '6.2', 'toolz': '0.12.0', 'numpy': '1.24.1', 'pandas': '1.5.2', 'lz4': '4.2.0'}}, 'tcp://10.244.0.4:36757': {'host': {'python': '3.8.15.final.0', 'python-bits': 64, 'OS': 'Linux', 'OS-release': '5.4.0-1105-azure', 'machine': 'x86_64', 'processor': 'x86_64', 'byteorder': 'little', 'LC_ALL': 'C.UTF-8', 'LANG': 'C.UTF-8'}, 'packages': {'python': '3.8.15.final.0', 'dask': '2023.1.0', 'distributed': '2023.1.0', 'msgpack': '1.0.4', 'cloudpickle': '2.2.0', 'tornado': '6.2', 'toolz': '0.12.0', 'numpy': '1.24.1', 'pandas': '1.5.2', 'lz4': '4.2.0'}}, 'tcp://10.244.1.7:40561': {'host': {'python': '3.8.15.final.0', 'python-bits': 64, 'OS': 'Linux', 'OS-release': '5.4.0-1105-azure', 'machine': 'x86_64', 'processor': 'x86_64', 'byteorder': 'little', 'LC_ALL': 'C.UTF-8', 'LANG': 'C.UTF-8'}, 'packages': {'python': '3.8.15.final.0', 'dask': '2023.1.0', 'distributed': '2023.1.0', 'msgpack': '1.0.4', 'cloudpickle': '2.2.0', 'tornado': '6.2', 'toolz': '0.12.0', 'numpy': '1.24.1', 'pandas': '1.5.2', 'lz4': '4.2.0'}}}, 'client': {'host': {'python': '3.8.16.final.0', 'python-bits': 64, 'OS': 'Linux', 'OS-release': '5.4.0-1105-azure', 'machine': 'x86_64', 'processor': 'x86_64', 'byteorder': 'little', 'LC_ALL': 'C.UTF-8', 'LANG': 'C.UTF-8'}, 'packages': {'python': '3.8.16.final.0', 'dask': '2023.4.0', 'distributed': '2023.4.0', 'msgpack': '1.0.5', 'cloudpickle': '2.2.1', 'tornado': '6.2', 'toolz': '0.12.0', 'numpy': '1.23.5', 'pandas': '2.0.0', 'lz4': '4.3.2'}}} @ 2023-04-20 13:33:09.921545"}
Solved, just forced the Dockerfile to use version 2023.1.0, that fixed the problem and matched the operator dask version.