cudagpugpgpucuda-contextmulti-process-service

How to reduce CUDA context size (Multi-Process Service)


I followed Robert Crovella's example on how to use Nvidia's Multi-Process Service. According to docs:

2.1.2. Reduced on-GPU context storage

Without MPS each CUDA processes using a GPU allocates separate storage and scheduling resources on the GPU. In contrast, the MPS server allocates one copy of GPU storage and scheduling resources shared by all its clients.

which I understood as the reduction of each of the processes' context sizes, which is possible because they are shared. This would increase free GPU memory and thus enable running more processes in parallel.

Now, back to the example. Without MPS:

MPS disabled

And with MPS:

MPS enabled

Unfortunately each process still takes virtually the same (~300MB) amount of memory. Isn't this in contradiction to the docs? Is there a way to decrease per process memory consumption?


Solution

  • Oops, I overeagerly asked before checking the memory usage on the other (pre-Volta) card and yes, there is actually a difference. Let me just post it here for future reference if anyone else stumbled on this problem too:

    MPS off:

    MPS disabled

    MPS on:

    MPS enabled