I'm using PocketSphinx on Android. After the recognizer initializes, I start a keyword listener. At first, the recognizer will not match anything. But, after a few seconds, the recognizer starts matching keywords with excellent performance (about a 3% WER in initial testing). The time it takes to start matching depends on the word/phrase. It also seems to depend on how many times you say the word. For instance, "plus" is matched very quickly, usually on the first or second utterance, taking an average of 2 seconds to match. "A little help please", on the other hand takes around 10 seconds, or about 8-10 utterances. Once any keyword is matched, Sphinx enters its high-performance mode for all keywords. So, one workaround (although not a very good one) is to say "plus" immediately after initialization completes. During the time that no matching occurs, onBeginningOfSpeech() and onEndOfSpeech() are called by Sphinx, corresponding to the utterances of the key phrase or keyword.
Keyword file:
cancel last
a little help please
add new cut/1e-35/
set material
set quantity
plus/5e-2/
minus/5e-2/
I'm using pocketsphinx-android-5prealpha-nolib.jar, and (if it makes a difference) have tested on a Samsung Galaxy-S3 and a Motorola Moto E (2nd Gen). The problem is the same whether or not I use a headset.
Use the standard model that ships with the PocketSphinx demo, en-us-ptm
. It's a lightweight* model, and has default CMN values set in the feat.params
file. Since CMN values are set, Sphinx doesn't have to take time to set them on startup, which means there is no delay in getting to quality recognition results on startup. The overall recognition results with the default model compared to the others I've tested on is very similar with my command-and-control grammars.
* less than 7MB vs. some others like Voxforge that are more than double that