pythonray

How to stop ray from running multiple tasks on the same cluster node


I have a ray cluster that is started manually on several nodes using ray start. How can I schedule tasks to run on the cluster, such that they are exclusive, i.e., no tasks are ran in parallel on one node?

One option would be to specify each node as having only 1 CPU. Another would be to introduce a custom resource 'node', with 1 instance per node.

But this seems like a common scenario, is their a cleaner way to handle this?


Solution

  • As ^ said, you can use the custom resources. For example,

    In the terminal,

    # Head
    ray start --head --resources="{<name_of_resources>: 1.0}"
    # Worker
    ray start --resources="{<name_of_resources>: 1.0}"
    

    In the ray driver (main Python entrypoint that calls ray.init),

    ray.init("auto")
    @ray.remote(resources={<name_of_resources>: 1})
    def ...