pythontensorflowspeech-recognitionctc

Error when trying to predict audio: Could not compute output Tensor ("ctc/ExpandDims_22:0"


So i tried to create a speech recognition neural network using the librispeech dataset dev-clean. I tried to convert the code from https://github.com/soheil-mpg/Speech-Recognition into a jupyter notebook.

Everything appears to be working. The model can be trained and doesn't give any errors. But when using model.predict() i get the following error:

AssertionError: Could not compute output Tensor("ctc/ExpandDims_22:0", shape=(None, 1), dtype=float32)

I uploaded the Jupyter Notebook to https://github.com/jake-salmone/ASR

The code is almost identical, the only thing i have change is, that i don't use the json, but use a pandas DataFrame.


Solution

  • I found the answer!: The model has the wrong output-dimensions.
    Of course the ctc loss should only be added to the model during training.

    when adding the ctc loss, it should only happen within the scope of a function:

    model = add_ctc_loss(model)
    

    and creating a training function that only adds the loss within the scope of the function will not change the model.