dockerubuntunvidiavulkannvidia-docker

Vulkan is unable to detect Nvidia GPU from within a docker container when using the Nvidia Container Toolkit


My goal is to be able to run Vulkan application in a docker container using the Nvidia Container Toolkit. Ideally running Ubuntu 22.04 on the host and in the container.

I've created a git repo to allow others to better reproduce this issue: https://github.com/rickyjames35/vulkan_docker_test The README explains my findings but I will reiterate them here.

For this test I'm running Ubuntu 22.04 on my host as well as in the container FROM ubuntu:22.04. For this test I'm seeing that the only device vulkaninfo is finding is llvmpipe which is a CPU based graphics driver. I'm also seeing that llvmpipe can't render when running vkcube both in the container and on the host for Ubuntu 22.04. Here is the container output for vkcube:

Selected GPU 0: llvmpipe (LLVM 13.0.1, 256 bits), type: 4
Could not find both graphics and present queues

On my host I can tell it to use llvmpipe:

vkcube --gpu_number 1
Selected GPU 1: llvmpipe (LLVM 13.0.1, 256 bits), type: Cpu
Could not find both graphics and present queues

As you can see they have the same error. What's interesting is if I swap the container to FROM ubuntu:20.04 then llvmpipe can render but this is moot since I do not wish to do CPU rendering. The main issue here is that Vulkan is unable to detect my Nvidia GPU from within the container when using the Nvidia Container Toolkit with NVIDIA_DRIVER_CAPABILITIES=all and NVIDIA_VISIBLE_DEVICES=all. I've also tried using nvidia/vulkan. When running vulkaninfo in this container I get:

vulkaninfo
ERROR: [Loader Message] Code 0 : vkCreateInstance: Found no drivers!
Cannot create Vulkan instance.
This problem is often caused by a faulty installation of the Vulkan driver or attempting to use a GPU that does not support Vulkan.
ERROR at /vulkan-sdk/1.3.236.0/source/Vulkan-Tools/vulkaninfo/vulkaninfo.h:674:vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER

I'm suspecting this has to to with me running Ubuntu 22.04 on the host although the whole point of docker is the host OS generally should not affect the container.

In the test above I was using nvidia-driver-525 I've tried using different versions of the driver with the same results. At this point I'm not sure if I'm doing something wrong or if Vulkan is not supported in the Nvidia Container Toolkit for Ubuntu 22.04 even though it claims to be.


Solution

  • I got it working all the way without needing to copy over nvidia_icd.json. Ian Purton your answer was very helpful in tracking down the answer. It led me here https://github.com/NVIDIA/nvidia-container-toolkit/issues/16 which shows that what Ian Purton posted was required until v1.12.0 of the NVIDIA Container Toolkit which fixes this problem. I've updated my repo with the fix. https://github.com/rickyjames35/vulkan_docker_test What I have in my repo is the most bare bones way you can run Vulkan in a docker container using docker compose.