speech-recognitionhtkdictation

Can we use htk for dictation like application?


I want to build speech recognizer system for dictation like application. I read htk book and other tutorials but all the tutorials are for command and control like applications. For those applications, set of commands, words limited and it is manually specified using task grammar (gram file).

In my application it is not possible to specify such grammar as I will be processing huge audio files containing conversation between two people.

So I would like to know whether it is possible to build such an application using htk.

Thanks...


Update after spending many sleep less nights

I got 86% accuracy using Sphinx. There was some problem with language model (I do not know exactly what was wrong with it, I am trying to find it out) so I created new language model using Sphinx lmtool which is a web based language model generation service. You can get it using this link

Also, I have changed acoustic model from HUB to WSJ.


Solution

  • Yes, you can. There are two decoders for that purpose:

    ATK

    and

    Julius

    Both require you to provide a language model for the large vocabulary speech recognition

    I also suggest you to look at CMUSphinx which is somewhat easier to use