I just started with TF and Keras and found out that I can't run these on my computer. I've notice the problem first in jupyter notebook, and than recreate it in a python file.
The code to reproduce (main.py):
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import keras
from keras import layers
from keras import ops
print("pass")
model = keras.Sequential()
model.add(layers.Input(shape=(28,)))
print("pass")
The python main.py
output:
2024-10-16 22:30:52.054073: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1729081852.071578 41750 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1729081852.076640 41750 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
pass
/usr/include/c++/14.1.1/bits/stl_vector.h:1130: constexpr std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = pybind11::object; _Alloc = std::allocator<pybind11::object>; reference = pybind11::object&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
fish: Job 1, 'python main.py' terminated by signal SIGABRT (Abort)
I using arch linux and installed python-tensorflow-opt-cuda package. I have a nvidia card and working cuda.
The nvcc --version
output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0
The python -c "import tensorflow as tf; print(tf.__version__)"
output:
2024-10-16 22:35:45.309836: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1729082145.328052 42393 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1729082145.333326 42393 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2.18.0-rc1
The pacman -Q cuda
output:
cuda 12.6.2-1
The pacman -Q cudnn
output:
cudnn 9.2.1.18-1
The pacman -Q nvidia
output:
nvidia-dkms 560.35.03-14
nvidia-dkms 560.35.03-14
The pacman -Q blas
output:
blas 3.12.0-5
The pacman -Q python-tensorflow-opt-cuda
output:
python-tensorflow-opt-cuda 2.18rc1-2
The pacman -Q python
output:
python 3.12.7-1
The pacman -Q python-numpy
output:
python-numpy 2.1.2-1
The pacman -Q python-keras
output:
python-keras 3.4.1-1
The sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
| N/A 43C P8 1W / 50W | 8MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Kernel: 6.11.3-zen1-1-zen
CPU: AMD Ryzen 5 4600H with Radeon Graphics (12) @ 3.000GH
GPU: NVIDIA GeForce GTX 1650 Mobile / Max-Q
I also don't understand why tf is 2.18 version in archlinux package repo. Isn't it unstable for now?
I've tried to reinstall all of the packages, but didn't downgrade them yet. I also tried using docker version of tensorflow, and I still got the same errors.
Found the problem.
It was an outdated python-optree package. Seems that 0.13.0 version doesn't have the issue.
The issue was:
>>> from optree import *
>>> tree_map(lambda x: x, ())
/usr/include/c++/14.1.1/bits/stl_vector.h:1130: constexpr std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = pybind11::object; _Alloc = std::allocator<pybind11::object>; reference = pybind11::object&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
fish: Job 1, 'python' terminated by signal SIGABRT (Abort)
Empty tuples or arrays raised the issue.
In version 0.13.0:
>>> from optree import *
>>> tree_map(lambda x: x, ())
()
It's not throwing anymore and everything works.