gpugpgpusungridenginemultiple-gpu

Scheduling GPU resources using the Sun Grid Engine (SGE)


We have a cluster of machines, each with 4 GPUs. Each job should be able to ask for 1-4 GPUs. Here's the catch: I would like the SGE to tell each job which GPU(s) it should take. Unlike the CPU, a GPU works best if only one process accesses it at a time. So I would like to:

Job #1  GPU: 0, 1, 3
Job #2  GPU: 2
Job #4  wait until 1-4 GPUs are avaliable

The problem I've run into, is that the SGE will let me create a GPU resource with 4 units on each node, but it won't explicitly tell a job which GPU to use (only that it gets 1, or 3, or whatever).

I thought of creating 4 resources (gpu0, gpu1, gpu2, gpu3), but am not sure if the -l flag will take a glob pattern, and can't figure out how the SGE would tell the job which gpu resources it received. Any ideas?


Solution

  • When you have multiple GPUs and you want your jobs to request a GPU but the Grid Engine scheduler should handle and select a free GPUs you can configure a RSMAP (resource map) complex (instead of a INT). This allows you to specify the amount as well as the names of the GPUs on a specific host in the host configuration. You can also set it up as a HOST consumable, so that independent of the slots your request, the amount of GPU devices requested with -l cuda=2 is for each host 2 (even if the parallel job got i.e. 8 slots on different hosts).

    qconf -mc
        #name               shortcut   type        relop   requestable consumable default  urgency     
        #----------------------------------------------------------------------------------------------
        gpu                 gpu        RSMAP         <=      YES         HOST        0        0
    

    In the execution host configuration you can initialize your resources with ids/names (here simply GPU1 and GPU2).

    qconf -me yourhost
    hostname              yourhost
    load_scaling          NONE
    complex_values        gpu=2(GPU1 GPU2)
    

    Then when requesting -l gpu=1 the Univa Grid Engine scheduler will select GPU2 if GPU1 is already used by a different job. You can see the actual selection in the qstat -j output. The job gets the selected GPU by reading out the $SGE_HGR_gpu environment variable, which contains in this case the chose id/name "GPU2". This can be used for accessing the right GPU without having collisions.

    If you have a multi-socket host you can even attach a GPU directly to some CPU cores near the GPU (near the PCIe bus) in order to speed up communication between GPU and CPUs. This is possible by attaching a topology mask in the execution host configuration.

    qconf -me yourhost
    hostname              yourhost
    load_scaling          NONE
    complex_values        gpu=2(GPU1:SCCCCScccc GPU2:SccccSCCCC)
    

    Now when the UGE scheduler selects GPU2 it automatically binds the job to all 4 cores (C) of the second socket (S) so that the job is not allowed to run on the first socket. This does not even require the -binding qsub param.

    More configuration examples you can find on www.gridengine.eu.

    Note, that all these features are only available in Univa Grid Engine (8.1.0/8.1.3 and higher), and not in SGE 6.2u5 and other Grid Engine version (like OGE, Sun of Grid Engine etc.). You can try it out by downloading the 48-core limited free version from univa.com.