c++cudagpumulti-gpu

Is it possible to execute multiple instances of a CUDA program on a multi-GPU machine?


Background:

I have written a CUDA program that performs processing on a sequence of symbols. The program processes all sequences of symbols in parallel with the stipulation that all sequences are of the same length. I'm sorting my data into groups with each group consisting entirely of sequences of the same length. The program processes 1 group at a time.

Question:

I am running my code on a Linux machine with 4 GPUs and would like to utilize all 4 GPUs by running 4 instances of my program (1 per GPU). Is it possible to have the program select a GPU that isn't in use by another CUDA application to run on? I don't want to hardcode anything that would cause problems down the road when the program is run on different hardware with a greater or fewer number of GPUs.


Solution

  • The environment variable CUDA_VISIBLE_DEVICES is your friend.

    I assume you have as many terminals open as you have GPUs. Let's say your application is called myexe

    Then in one terminal, you could do:

    CUDA_VISIBLE_DEVICES="0" ./myexe
    

    In the next terminal:

    CUDA_VISIBLE_DEVICES="1" ./myexe
    

    and so on.

    Then the first instance will run on the first GPU enumerated by CUDA. The second instance will run on the second GPU (only), and so on.

    Assuming bash, and for a given terminal session, you can make this "permanent" by exporting the variable:

    export CUDA_VISIBLE_DEVICES="2"
    

    thereafter, all CUDA applications run in that session will observe only the third enumerated GPU (enumeration starts at 0), and they will observe that GPU as if it were device 0 in their session.

    This means you don't have to make any changes to your application for this method, assuming your app uses the default GPU or GPU 0.

    You can also extend this to make multiple GPUs available, for example:

    export CUDA_VISIBLE_DEVICES="2,4"
    

    means the GPUs that would ordinarily enumerate as 2 and 4 would now be the only GPUs "visible" in that session and they would enumerate as 0 and 1.

    In my opinion the above approach is the easiest. Selecting a GPU that "isn't in use" is problematic because:

    1. we need a definition of "in use"
    2. A GPU that was in use at a particular instant may not be in use immediately after that
    3. Most important, a GPU that is not "in use" could become "in use" asynchronously, meaning you are exposed to race conditions.

    So the best advice (IMO) is to manage the GPUs explicitly. Otherwise you need some form of job scheduler (outside the scope of this question, IMO) to be able to query unused GPUs and "reserve" one before another app tries to do so, in an orderly fashion.