dockercudanvidia-docker

Why does nvidia-smi show same CUDA version and driver version both inside and outside of docker container?


I installed nvidia-docker and to test my installation, I ran docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi. I get this

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro T2000 wi...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   46C    P0    10W /  N/A |   2294MiB /  3911MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

The driver version and CUDA version are exactly the same as what I get when I run nvidia-smi outside the container in my regular terminal. My understanding of why the driver version is the same is that device drivers are hardware specific, and thus aren't installed inside the container, and the reason why nvidia-docker exists is to allow software running inside the container to talk to the device drivers. Is this correct?

My main point of confusion is why the CUDA version is reported as 11.4 from inside the container. When I launch a bash terminal inside this container and look at the CUDA installation in /usr/local, I only see version 10.0, so why is nvidia-smi inside the container giving me CUDA version installed on my host system?

I believe these questions display a fundamental misunderstanding either of how nvidia-smi works, or how nvidia-docker works, so could someone point me towards resources that might help me resolve this misunderstanding?


Solution

  • You can't have more than 1 GPU driver operational in this setting. Period. That driver is installed in the base machine. If you do something not recommended, like install it or attempt to install it in the container, it is still the one in the base machine that is in effect for the base machine as well as the container. Note that anything reported by nvidia-smi pertains to the GPU driver only, and therefore is using the driver installed in the base machine, whether you run it inside or outside of the container. There may be detailed reporting differences like visible GPUs, but this doesn't impact versions reported.

    The CUDA runtime version will be the one that is installed in the container. Period. It has no ability to inspect what is outside the container. If it happens to match what you see outside the container, then it is simply the case that you have the same configuration outside the container as well as inside.

    Probably most of your confusion would be resolved with this answer and perhaps your question is a duplicate of that one.