mpidistributed-computinggpgpuhpcamd-rocm

Limit MPI to run on single GPU even if we have single Node multi GPU setup


I am new to distributed computing and I am trying to run a program which uses MPI and ROCm(AMD framework to run on GPU).

The command I am using to run the program is mpirun -np 4 ./a.out

But it is defaultly running on the available 2 GPUs in my machine. Is there a way to make it run only on single GPU and if yes how?

Thanks in Advance :)


Solution

  • You may control the active GPU(s) by setting some environment variables (e.g. GPU_DEVICE_ORDINAL, ROCR_VISIBLE_DEVICES or HIP_VISIBLE_DEVICES, see this or this for more details).

    For instance:

    export HIP_VISIBLE_DEVICES=0
    mpirun -np 4 ./a.out
    # or 
    HIP_VISIBLE_DEVICES=0 mpirun -np 4 ./a.out
    

    Be careful that some MPI implementations do not export all environement variables, or may reload your bashrc or cshrc. So using your MPI's syntax to set envvars is safer:

    # with openmpi 
    mpirun -x HIP_VISIBLE_DEVICES=0 -np 4 ./a.out
    
    # or with mpich
    mpiexec -env HIP_VISIBLE_DEVICES 0 -n 4 ./a.out
    

    To be on the safe side, it's probably a good idea to add this to your C++ code:

    #include <stdlib.h>
    // ...
    char* hip_visible_devices = getenv("HIP_VISIBLE_DEVICES");
    if (hip_visible_devices) std::cout << "Running on GPUs: " << hip_visible_devices << std::endl;
    else std::cout << "Running on all GPUs! " << std::endl;
    

    (note that cuda has both an envvar and a C-function CudaSetDevice(id), I'm wondering if there's an equivalent for AMD or openCL).