cmusphinx

"backward.c", line 421: Failed to align audio to trancript


My script was doing speech recognition training fine, until recently I tried to scale up to train on more data, now it output this error.

ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached

What does that mean? What can I do about it?

It looks like the model training proceed anyway, but not sure if this is an error I can ignore.

I checked out this link, but I am pretty sure my audio are sampled at 16KHz.


Solution

  • As explained in documentation:

    Sometimes audio in your database doesn't match the transcription properly. For example transcription file has the line “Hello world” but in audio actually “Hello hello world” is pronounced. Training process usually detects that and emits this message in the logs. If there are too many such errors it most likely mean you misconfigured something, for example you had a mismatch between audio and the text caused by transcription reordering. Or input audio sample rate is wrong

    If there are few errors, you can ignore them. You might want to edit the transcription file to put there exact word which were pronounced, in the case above you need to edit the transcription file and put “Hello hello world” on corresponding line. You might want to filter such prompts because they affect acoustic model quality. In that case you need to enable forced alignment stage in training.