I’m trying to figure out whether Ray will work for an application, and I’m trying to understand how dependencies get to the workers in a Ray cluster. Ex: let’s say I have
@ray.remote
def foo():
a = do_something_requiring_pandas()
b = do_something_requiring_openmpi()
return a + b
How do I make sure the workers have access to pandas (a third party python package) and openmpi (a non-python package usually installed via the OS package manager)? Do I have to just ensure that the workers have them installed “out of band” from Ray? Or does Ray do some automagic packaging of dependencies that it sends to the worker along with the task (I can see how that could work in the pandas case, but not the openmpi one)? I don’t actually care about pandas or openmpi specifically, they’re just handy examples of two different categories of dependency.
For the case of pip and conda dependencies, Runtime Environments (https://docs.ray.io/en/master/advanced.html#runtime-environments-experimental) looks like the feature you're looking for. It's very new, so feedback is welcome!
EDIT by romeo, 27th of Feb '23
seems the link moved to https://docs.ray.io/en/latest/ray-core/handling-dependencies.html
EDIT
For something like openmpi, you could try https://docs.ray.io/en/master/cluster/config.html#setup-commands to ensure openmpi is installed on each node in your cluster.