I've installed nvidia-container-runtime
on my machine (Ubuntu 22.04), and can access the GPU through docker run
.
docker run -it --rm --gpus all selenium/node-chrome:3.141.59 nvidia-smi
Mon Oct 24 00:32:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:0A:00.0 Off | N/A |
| 0% 41C P8 44W / 370W | 68MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
However, when running with the following docker-compose.yml
, nvidia-smi
can't be found. Applications inside the container don't seem to be using the GPU either.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command:
["nvidia-smi"]
Running docker-compose up
[+] Running 1/0
⠿ Container docker-compose-gpu-nvidia-1 Recreated 0.0s
Attaching to docker-compose-gpu-nvidia-1
Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown
If I swap the selenium image to nvidia/cuda
, docker-compose
can see the GPU. Why is the GPU accessible in docker run
but not docker-compose
?
Specifying the driver & count fixed this.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
command:
["nvidia-smi"]
I'm not sure why this worked - the docs seem to indicate that omitting these will just use all available GPUs.