Using tensorflow with GPU on Docker on Ubuntu

I've been struggling to the problem written below for many days and would like you to help me.
What I want to do is to use tensorflow with GPU on Docker on Ubuntu.
My GPU is GeForce GTX 1070, and my OS is Ubuntu 22.04.3 LTS

I've installed Docker

$ docker --version

Docker version 26.1.1, build 4cf5afa

Before I started the following, I removed every nvidia or cuda module.

$ sudo apt-get -y --purge remove nvidia*
$ sudo apt-get -y --purge remove cuda*
$ sudo apt-get -y --purge remove cudnn*
$ sudo apt-get -y --purge remove libnvidia*
$ sudo apt-get -y --purge remove libcuda*
$ sudo apt-get -y --purge remove libcudnn*
$ sudo apt-get autoremove
$ sudo apt-get autoclean
$ sudo apt-get update
$ sudo rm -rf /usr/local/cuda*
$ pip uninstall tensorflow-gpu

Afterward, I installed Nvidia driver

$ sudo apt install nvidia-driver-535

And nvidia-smi works fine.

$ nvidia-smi

Thu May 2 18:10:31 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
...

The next thing I did was to install CUDA Toolkit 12.2 Update 2 following the instruction shown below.

https://developer.nvidia.com/cuda-12-2-2-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

I think CUDA Toolkit 12.2 Update 2 and driver 535.104.05 are compatible according to the info shown below.

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

And then I installed NVIDIA Container Toolkit like below

$ curl https://get.docker.com | sh \
  && sudo systemctl --now enable docker
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

And next, I pulled a docker image.

$ docker pull tensorflow/tensorflow:latest-gpu

$ docker container run --rm --gpus all -it --name tf --mount type=bind,source=/home/(myname)/docker/tensorflow,target=/bindcont tensorflow/tensorflow:latest-gpu bash

In Docker container

root@a887e2a18124:/# python

Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow as tf

2024-05-02 09:32:46.211605: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-05-02 09:32:46.238888: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

>>> tf.config.list_physical_devices()

2024-05-02 09:32:55.124912: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW
2024-05-02 09:32:55.124931: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:134] retrieving CUDA diagnostic information for host: 226046be5f09 2024-05-02 09:32:55.124934: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:141] hostname: 226046be5f09 2024-05-02 09:32:55.124963: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:165] libcuda reported version is: 545.23.6
2024-05-02 09:32:55.124975: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:169] kernel reported version is: 535.104.5
2024-05-02 09:32:55.124977: E external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:251] kernel version 535.104.5 does not match DSO version 545.23.6 -- cannot find working devices in this configuration
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
-- End of Message --

It seems the driver version and cuda version are inconsistent but I installed a dvriver version 535 not 545 as shown above. And I removed everything before I installed the driver-535.

Could anyone suggest what is wrong and what I should do?

My problem has not been solved yet.

I removed everything and reinstalled the Nvidia driver-545.
And followed the instruction https://github.com/NVIDIA/nvidia-docker (deprecated) and
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

This time I didn't installed CUDA Tool-kit but NVIDIA Container Toolkit.

I got from nvidia-smi
NVIDIA-SMI 545.29.06
Driver Version 545.29.06
CUDA Version 12.3

Then I ran a container

$ docker container run --rm -it --name tf --mount type=bind,source=/home/susumu/docker/tensorflow,target=/bindcont tensorflow/tensorflow:2.15.0rc1-gpu bash

When I ran sample.py, I got

# python sample.py

2024-05-02 13:46:01.669548: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-05-02 13:46:01.689375: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-05-02 13:46:01.689395: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-05-02 13:46:01.690008: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-02 13:46:01.693281: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-02 13:46:01.693384: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-02 13:46:02.374705: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:274] failed call to cuInit: UNKNOWN ERROR (34)

tf.Tensor( [[1.] [1.]], shape=(2, 1), dtype=float32)

Here, sample.py is like below

# cat sample.py
import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'

import tensorflow as tf
x = tf.ones(shape=(2, 1))

print(x)

As mhenning pointed out, my command lacked "--gpus all". So I added that option, and ran again.

$ docker container run --rm -it --gpus all --name tf --mount type=bind,source=/home/susumu/docker/tensorflow,target=/bindcont tensorflow/tensorflow:2.15.0rc1-gpu bash
root@112cb77313ca:/bindcont# python sample.py

2024-05-07 13:02:16.253609: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-05-07 13:02:16.273561: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-05-07 13:02:16.273586: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-05-07 13:02:16.274202: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-07 13:02:16.277520: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-07 13:02:16.277623: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-07 13:02:17.104731: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-07 13:02:17.107016: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices...

tf.Tensor(  
[[1.]  
 [1.]], shape=(2, 1), dtype=float32)

Edited on May 13, 2024

As mhenning suggested, I pulled tensorflow/tensorflow:2.14.0-gpu and tried to run sample.py again.

root@e02085a11772:/bindcont# python sample.py

2024-05-13 12:42:46.130673: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-05-13 12:42:46.130698: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-05-13 12:42:46.130737: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-13 12:42:46.134588: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-13 12:42:46.980797: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:46.986228: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:46.986414: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:46.987792: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:46.987949: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:46.988049: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:47.092104: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:47.092243: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:47.092328: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-05-13 12:42:47.092398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7329 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:04:00.0, compute capability: 6.1 tf.Tensor( [[1.] [1.]], shape=(2, 1), dtype=float32)

It seems my GPU works properly!

root@bce19cf9ec80:/# python -c "import tensorflow as tf;print(tf.sysconfig.get_build_info())"

2024-05-13 12:46:41.451323: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-13 12:46:41.471928: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-05-13 12:46:41.471950: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-05-13 12:46:41.471964: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-13 12:46:41.475902: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. OrderedDict([('cpu_compiler', '/usr/lib/llvm-16/bin/clang'), ('cuda_compute_capabilities', ['sm_35', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']), ('cuda_version', '11.8'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

By the way, when I checked my BIOS setting, secure boot was off, so it had nothing to do with my trouble.

Solution

It seems that for docker container you don't need to install CUDA drivers on the host system. From the link:

Docker is the easiest way to enable TensorFlow GPU support on Linux since only the NVIDIA® GPU driver is required on the host machine (the NVIDIA® CUDA® Toolkit does not need to be installed).

This TF container installs CUDA 12.3 inside (you can see it in the image layers list here), and according to this table (table 3 in the NVIDIA CUDA Toolkit Release Notes) CUDA 12.3 needs nvidia drivers >=545 (which is the missmatch in the error stack). This is a bit counterintuitive to other tables where the minimum requirements for CUDA 12 is just driver versions >=525. From the link:

CUDA Toolkit        Toolkit Driver Version  
                    Linux x86_64 Driver Version   Windows x86_64 Driver Version
CUDA 12.4 Update 1  >=550.54.15                   >=551.78
CUDA 12.4 GA        >=550.54.14                   >=551.61
CUDA 12.3 Update 1  >=545.23.08                   >=546.12
CUDA 12.3 GA        >=545.23.06 <- this one       >=545.84
CUDA 12.2 Update 2  >=535.104.05                  >=537.13
...

The easiest way would be to update your drivers to version >=545. It seems that for a 1070, the latest drivers are 550, so you should be fine version-wise.
Alternatively, you could use the last docker image with CUDA 11.8, ~~which would be tensorflow:2.15.0rc1-gpu~~ (This is wrong, see edit). All other gpu images after this seems to use CUDA 12.3 when looking at the definitions in the image layers.

Edit: I was wrong with the tensorflow tag I recommended, this was indeed the worst version to recommend. It indeed installs CUDA 11.8, which one can check with nvcc --version, but when you check the required CUDA version with tf.sysconfig.get_build_info() (or one line with):

python3 -c "import tensorflow as tf;print(tf.sysconfig.get_build_info())"

you get the following with tensorflow:2.15.0rc1-gpu :

OrderedDict([('cpu_compiler', '/usr/lib/llvm-17/bin/clang'), ('cuda_compute_capabilities', ['sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']), ('cuda_version', '12.2'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

Here you can see that TF expects CUDA 12.2, which is obviously not installed, voiding the GPU capabilities of this container.

I checked with the other tags, for CUDA 11.8 you can use the tag tensorflow/tensorflow:2.14.0-gpu, and for CUDA 12.2 you can use tensorflow/tensorflow:2.15.0-gpu (non-rc version) and above.