So i tried to create a speech recognition neural network using the librispeech dataset dev-clean. I tried to convert the code from https://github.com/soheil-mpg/Speech-Recognition into a jupyter notebook.
Everything appears to be working. The model can be trained and doesn't give any errors. But when using model.predict() i get the following error:
AssertionError: Could not compute output Tensor("ctc/ExpandDims_22:0", shape=(None, 1), dtype=float32)
I uploaded the Jupyter Notebook to https://github.com/jake-salmone/ASR
The code is almost identical, the only thing i have change is, that i don't use the json, but use a pandas DataFrame.
I found the answer!: The model has the wrong output-dimensions.
Of course the ctc loss should only be added to the model during training.
when adding the ctc loss, it should only happen within the scope of a function:
model = add_ctc_loss(model)
and creating a training function that only adds the loss within the scope of the function will not change the model.