I'm trying to train DeepSpeech model on Common Voice dataset as it's stated in documentation. But it gives the following error:
I0421 11:34:32.779112 140581195995008 utils.py:157] NumExpr defaulting to 2 threads.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
self._extend_graph()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/DeepSpeech/DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/content/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script
absl.app.run(main)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/content/DeepSpeech/training/deepspeech_training/train.py", line 954, in main
train()
File "/content/DeepSpeech/training/deepspeech_training/train.py", line 529, in train
load_or_init_graph_for_training(session)
File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 137, in load_or_init_graph_for_training
_load_or_init_impl(session, methods, allow_drop_layers=True)
File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 112, in _load_or_init_impl
return _initialize_all_variables(session)
File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 88, in _initialize_all_variables
session.run(v.initializer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
My local machine spec is as follows:
python 3.7; Cuda 10.1; CuDNN 7.6.5; tensorflow-gpu 1.15.2; GPU GTX 1050 ti
I'm also installing the following packages and libraries to prepare the environment:
!apt-add-repository universe
!apt-get install sox libsox-fmt-mp3 cmake libblkid-dev e2fslibs-dev libboost-all-dev libaudit-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev
!python3.7 -m pip install sox
!python3.7 -m pip install deepspeech-gpu
!python3.7 -m pip install tensorflow-gpu==1.15.2
!python3.7 -m pip install numpy==1.19.5
!python3.7 -m pip install progressbar2
!python3.7 -m pip install progressbar
!python3.7 -m pip install progressbar33
!python3.7 -m pip install ds_ctcdecoder==0.10.0-alpha.3
!python3.7 -m pip install pyogg==0.6.14a1
!python3.7 -m pip install deepspeech
!git clone --branch v0.9.3 https://github.com/mozilla/DeepSpeech
!python3.7 -m pip install --upgrade --force-reinstall -e ./DeepSpeech/
!git clone https://github.com/kpu/kenlm.git
!mkdir -p build
!cmake kenlm
!make -j 4
!wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-checkpoint.tar.gz
!curl -LO "https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/native_client.amd64.cuda.linux.tar.xz"
!mkdir native_client
!tar xvf native_client.amd64.cuda.linux.tar.xz -C native_client
I'm having the same problem both on my local machine and on google colab vm.
EDIT: I also changed my cuda and cudnn versions to 10.0 and 7.5.6, respectively. But the error already exists.
I have fixed the problem. The problem was caused by version of the Tensorflow. As I mentioned before, I used Tf 1.15.2, where I had to use Tf 1.15.4, instead.