daskdask-distributed

dask distributed: How to increase timeout for worker connections? connect() didn't finish in time


OSError: Timed out trying to connect to 'tcp://127.0.0.1:40475' after 10 s: Timed out trying to connect to 'tcp:// 8.56.11:40475' after 10 s: connect() didn't finish in time

Having some huge operations running, I would like to increase the timeout using the Convertion Tool. But I wonder, which configuration option is really used here?

I tried:

os.environ["DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT"] = "33s"
os.environ["DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP"] = "35s"
os.environ["DASK_DISTRIBUTED__DEPLOY__LOST_WORKER"] = "34s"

but no effect (still 10 seconds for the timeout.


Solution

  • The answer is in ~/.dask/config.yaml:

    # Communication options
    connect-timeout: 10      # seconds delay before connecting fails
    tcp-timeout: 30         # seconds delay before calling an unresponsive connection dead
    default-scheme: tcp