Basically I would like to run multiple MPI ranks on a single GPU (NVidia K20c), and I am aware of the existence of MPS and Kepler's Hyper-Q.
However, my question is, is Hyper-Q itself enough for my need? Or I have to use MPS? According to the above Hyper-Q link, "No extra coding effort is necessary to enable Hyper-Q. All it takes is a Tesla K20 GPU with a CUDA 5 installation and setting an environment variable to let multiple MPI ranks share the GPU – Hyper-Q is then ready to use."
Does this mean that I don't need MPS at all?
p.s., I am also aware of the following question on a similar topic, but it seems that doesn't answer my question clearly. Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI?
Thanks.
You can run multiple MPI ranks without MPS on a single GPU. In that case, all the rank (GPU) code will serialize. A given rank's GPU code will only begin to execute when the GPU code associated with the previous rank has completely finished and exited the GPU.
If you want to have any opportunity for the GPU code from one rank to execute concurrently with the GPU code from another rank, then MPS will be necessary. If the GPU code associated with a rank makes full usage of the GPU, then you're not likely to see much benefit from MPS. The significant benefit will be observed with the rank GPU code can execute concurrently with the GPU code of another rank.