I'm trying to compile the tensorflow 2.3 C API for Xavier in a docker image. I'm using this as the base docker image which seems to have the correct version of CUDA installed, but the build fails with the following message:
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 1369
#9 51.98 _create_local_cuda_repository(<1 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 955, in _create_local_cuda_repository
#9 51.98 _get_cuda_config(repository_ctx, <1 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 657, in _get_cuda_config
#9 51.98 find_cuda_config(repository_ctx, <2 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 635, in find_cuda_config
#9 51.98 _exec_find_cuda_config(<3 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 629, in _exec_find_cuda_config
#9 51.98 execute(repository_ctx, <1 more arguments>)
#9 51.98 File "/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
#9 51.98 fail(<1 more arguments>)
#9 51.98 Repository command failed
#9 51.98 Could not find any libcudart.so.10* in any subdirectory:
#9 51.98 ''
#9 51.98 'lib64'
#9 51.98 'lib'
#9 51.98 'lib/*-linux-gnu'
#9 51.98 'lib/x64'
#9 51.98 'extras/CUPTI/*'
#9 51.98 of:
#9 51.98 '/usr/local/cuda-10.2'
Here are the relevant parts of my Dockerfile for reference:
FROM nvcr.io/nvidia/l4t-base:r32.5.0
# ... setup bazel etc
# Tensorflow
ENV TF_NEED_CUDA=1 \
GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
TF_CUDA_VERSION=10.2 \
CUDA_TOOLKIT_PATH=/usr/local/cuda-10.2 \
TF_CUDNN_VERSION=8 \
CUDNN_INSTALL_PATH=/usr/local/cuda-10.2 \
TF_CUDA_COMPUTE_CAPABILITIES=7.2,7.5 \
CC_OPT_FLAGS="--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 --copt=-mfpmath=both --config=cuda" \
PYTHON_BIN_PATH="/usr/bin/python" \
USE_DEFAULT_PYTHON_LIB_PATH=1 \
TF_NEED_JEMALLOC=1 \
TF_NEED_GCP=0 \
TF_NEED_HDFS=0 \
TF_ENABLE_XLA=0 \
TF_NEED_OPENCL=0
RUN cd / && git clone https://github.com/tensorflow/tensorflow
# The bazel build in the next line fails
RUN cd /tensorflow && git checkout r2.3 && bazel build -c opt //tensorflow/tools/lib_package:libtensorflow
Am I missing some compile options, or do I have to do some extra steps to properly set up CUDA?
It seems that building Tensorflow 2.3 for 64 bit ARM with CUDA isn't possible. Tensorflow 2.3 needs CUDA 10.2, but the CUDA toolkit isn't supported on ARM until version 11 [1], and CUDA 11 isn't supported by Tensorflow until version 2.4 [1].