dockerdocker-composenvidia-docker

Docker compose: can't access GPU from compose (but can from run)


I've installed nvidia-container-runtime on my machine (Ubuntu 22.04), and can access the GPU through docker run.

docker run -it --rm --gpus all selenium/node-chrome:3.141.59 nvidia-smi
Mon Oct 24 00:32:32 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0A:00.0 Off |                  N/A |
|  0%   41C    P8    44W / 370W |     68MiB / 10240MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

However, when running with the following docker-compose.yml, nvidia-smi can't be found. Applications inside the container don't seem to be using the GPU either.

version: "3.8"
services:
  nvidia:
    image: selenium/node-chrome:3.141.59
    runtime: nvidia
    deploy:
        resources:
          reservations:
            devices:
              - capabilities: [gpu]
    command:
      ["nvidia-smi"]

Running docker-compose up

[+] Running 1/0
 ⠿ Container docker-compose-gpu-nvidia-1  Recreated                                                            0.0s
Attaching to docker-compose-gpu-nvidia-1
Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown

If I swap the selenium image to nvidia/cuda, docker-compose can see the GPU. Why is the GPU accessible in docker run but not docker-compose?


Solution

  • Specifying the driver & count fixed this.

    version: "3.8"
    services:
      nvidia:
        image: selenium/node-chrome:3.141.59
        runtime: nvidia
        deploy:
            resources:
              reservations:
                devices:
                  - driver: nvidia
                    count: 1
                    capabilities: [gpu]
        command:
          ["nvidia-smi"]
    

    I'm not sure why this worked - the docs seem to indicate that omitting these will just use all available GPUs.