dockercudawindows-subsystem-for-linuxnvidia-docker

CUDA Version mismatch in Docker with WSL2 backend


I am trying to use docker (Docker Desktop for Windows 10 Pro) with the WSL2 Backend (WINDOWS SUBSHELL LINUX (WSL) (Ubuntu 20.04.4 LTS)).

That part seems to be working fine, except I would like to pass my GPU (Nvidia RTX A5000) through to my docker container.

Before I even get that far, I am still trying to set things up. I found a very good tutorial aimed at 18.04, but found all the steps are the same for 20.04, just with some version numbers bumpede.

At the end, I can see that my Cuda versions do not match. You can see that here, in this image.

The real issue is when I try to run the test command as shown on the docker website:

 docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

I get this error:

 --> docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380:
starting container process caused: process_linux.go:545: container init caused: Running
hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli:
requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a
newer version, or use an earlier cuda container: unknown.

... and I just don't know what to do, or how I can fix this.

Can someone explain how to get the GPU to pass through to a docker container successfully.


Solution

  • The comment from @RobertCrovella resolved this:

    please update your driver to a newer version when using WSL, the driver in your WSL setup is not something you install in WSL, it is provided by the driver on the windows side. Your WSL driver is 472.84 and this is too old to work with CUDA 11.6 (it only supports up to CUDA 11.4). So you would need to update your windows side driver to the latest one possible for your GPU, if you want to run a CUDA 11.6 test case. Regarding the "mismatch" of CUDA versions, this provides general background material for interpretation.

    Downloading the most current Nvidia driver:

    Version:             R510 U3 (511.79)  WHQL
    Release Date:        2022.2.14
    Operating System:    Windows 10 64-bit, Windows 11
    Language:            English (US)
    File Size:           640.19 MB
    

    Now I am able to support CUDA 11.6, and the test from the docker documentation now works:

    --> docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
    Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
            -fullscreen       (run n-body simulation in fullscreen mode)
            -fp64             (use double precision floating point values for simulation)
            -hostmem          (stores simulation data in host memory)
            -benchmark        (run benchmark to measure performance)
            -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
            -device=<d>       (where d=0,1,2.... for the CUDA device to use)
            -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
            -compare          (compares simulation results running once on the default GPU and once on the CPU)
            -cpu              (run n-body simulation on the CPU)
            -tipsy=<file.bin> (load a tipsy model file for simulation)
    
    NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
    
    > Windowed mode
    > Simulation data stored in video memory
    > Single precision floating point simulation
    > 1 Devices used for simulation
    GPU Device 0: "Ampere" with compute capability 8.6
    
    > Compute 8.6 CUDA device: [NVIDIA RTX A5000]
    65536 bodies, total time for 10 iterations: 58.655 ms
    = 732.246 billion interactions per second
    = 14644.916 single-precision GFLOP/s at 20 flops per interaction
    

    Thank you for the quick response!