apache-sparkazure-databricksdistributed-computingray

How to configure Ray cluster to utilize the Full Capacity of Databricks Cluster


I have a Databricks cluster configured with a minimum of 1 worker and a maximum of 4 workers, with auto-scaling enabled. What should my Ray configuration (setup_ray_cluster) be to fully utilize the cluster's capacity?

import ray
from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster

try:
  shutdown_ray_cluster()
except:
  pass

_, cluster_address = setup_ray_cluster(num_worker_nodes=4, autoscale=True)
ray.init(cluster_address , ignore_reinit_error=True)

Solution

  • As per the Documentation,

    If your ray version is below 2.10, num_worker_nodes is the maximum number of worker nodes when you set the autoscale=True and the minimum number of worker nodes is zero by default.

    If the ray version is above 2.10, you can set the minimum and maximum number of worker nodes for the cluster like below.

    from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster
    
    setup_ray_cluster(min_worker_nodes=1,max_worker_nodes=4,collect_log_to_path="/dbfs/path/to/ray_collected_logs")
    
    ray.init(cluster_address , ignore_reinit_error=True)
    

    The above code is also referred from the same documentation.

    If you didn't specify min_worker_nodes, the above cluster will become a fixed-size cluster with the number of worker nodes of max_worker_nodes.

    You can see the same below where I have given 2 and 8 as the minimum and maximum number of worker nodes.

    enter image description here