[SOLVED] Dask Distributed: Limit Dask distributed worker to 1 CPU

Dask Distributed: Limit Dask distributed worker to 1 CPU

My system has 4 CPU, 16 GB RAM. My Aim is to deploy dask distributed workers that use 1 CPU each ONLY to run code assigned to them.

I am deploying a scheduler container and worker containers using docker to run a code that uses Dask delayed and dask dataframes.

Following are the Docker run commands for both:

Scheduler

docker run --name dask-scheduler --network host daskdev/dask:2023.1.1-py3.10 dask-scheduler

worker (multiple docker run commands tried, all different combinations of following command)

docker run --name dask-worker --network host dask-worker --nworkers 1 --nthreads 1 --resources "process=1" --memory-limit 2 GiB --name dask-worker daskdev/dask:2023.1.1-py3.10 10.76.8.50:8786

Creating client and establishig connection with scheduler.
client = Client("tcp://10.76.8.50:8786")

Now what I wanted was that when dask.compute(scheduler="processes") is run, the worker will use only 1 cpu for running the code. However, atleast 3 CPU can be seen at 100% capacity.

Is there something I have missed?
How could I limit a distributed worker to use ONLY 1 CPU for this compute work?

Solution

You are specifying scheduler="processes", which will completely ignore your distributed cluster. Instead, make a distributed client in the python session where you are running your code client = dask.distributed.Client(...) with the TCP address of the scheduler, and the compute will run on your workers, one thread each, automatically.