pythondaskdask-distributed

dask: What does memory_limit control?


In dask's LocalCluster, there is a parameter memory_limit. I can't find in the documentation (https://distributed.dask.org/en/latest/worker.html#memory-management) details about whether the limit is per worker, per thread, or for the whole cluster. This is probably at least in part because I have trouble following how keywords are passed around.

For example, in this code:

cluster = LocalCluster(n_workers=2,
                       threads_per_worker=4,
                       memory_target_fraction=0.95,
                       memory_limit='32GB')

will that be 32 GB for each worker? For both workers together? Or for each thread?

My question is motivated partly by running a LocalCluster with n_workers=1 and memory_limit=32GB, but it gets killed by the Slurm Out-Of-Memory killer for using too much memory.


Solution

  • The memory_limit keyword argument to LocalCluster sets the limit per worker.

    Related documentaion: https://github.com/dask/distributed/blob/7bf884b941363242c3884b598205c75373287190/distributed/deploy/local.py#L76-L78

    Note, if the memory_limit given is greater than the available memory, the total available memory will be set for each worker. This behavior hasn't been documented yet, but a relevant issue is here: https://github.com/dask/dask/issues/8224

    Screenshot of cluster with code:

    from dask.distributed import LocalCluster, Client
    cluster = LocalCluster(n_workers=2,
                           threads_per_worker=4,
                           memory_target_fraction=0.95,
                           memory_limit='8GB')
    client = Client(cluster)
    client
    

    enter image description here