My system has 4 CPU, 16 GB RAM. My Aim is to deploy dask distributed workers that use 1 CPU each ONLY to run code assigned to them.
I am deploying a scheduler container and worker containers using docker to run a code that uses Dask delayed and dask dataframes.
Following are the Docker run commands for both:
Scheduler
docker run --name dask-scheduler --network host daskdev/dask:2023.1.1-py3.10 dask-scheduler
worker (multiple docker run commands tried, all different combinations of following command)
docker run --name dask-worker --network host dask-worker --nworkers 1 --nthreads 1 --resources "process=1" --memory-limit 2 GiB --name dask-worker daskdev/dask:2023.1.1-py3.10 10.76.8.50:8786
Creating client and establishig connection with scheduler.
client = Client("tcp://10.76.8.50:8786")
Now what I wanted was that when dask.compute(scheduler="processes") is run, the worker will use only 1 cpu for running the code. However, atleast 3 CPU can be seen at 100% capacity.
Is there something I have missed?
How could I limit a distributed worker to use ONLY 1 CPU for this compute work?
You are specifying scheduler="processes", which will completely ignore your distributed cluster. Instead, make a distributed client in the python session where you are running your code client = dask.distributed.Client(...)
with the TCP address of the scheduler, and the compute will run on your workers, one thread each, automatically.