I would like my Android application to do continuous keywords spotting. I'm modifying the pocketsphinx android demo to test how I can do it. I wrote this list in a file named en-keywords.txt picking words from cmudict-en-us.dict:
rainbow /1e-50/
about /1e-50/
blood /1e-50/
energies /1e-50/
In setupRecognizer method I removed every search and added to the recognizer only this keyword search:
File keywords= new File(assetsDir, "en-keywords.txt");
recognizer.addKeywordSearch(KWS_SEARCH, keywords);
Finally I modified onPartialResult like this:
public void onPartialResult(Hypothesis hypothesis) {
if (hypothesis == null)
return;
String text = hypothesis.getHypstr();
switchSearch(KWS_SEARCH);
}
so that every time a partial result is found with a not null hypotesis the onResult is called and the search starts again.
What I see in the app running is not what I'm expecting:
I tried with different tresholds for keywords but I didn't find my way... Probably I'm missing some basic concept or some configuration parameter... Can someone help me on this?
Definitely the solution is to understand how thresholds work and to tune them correctly. I read from sourceforgeforum that the higher the treshold (max 1) the less false alarm (with the risk of missing true matches) and viceversa (min 1e-50). Pocketsphinx code will use your threshold and return a match if the weight of a possible recognition is greater or equal to your threshold: giving a keyphrase a threshold of 1 means you want to have that keyphrase in the result only if pocketsphinx is absolutely sure of what has been spoken.
I was using 1e-50 which is a very low treshold that leads to a lot of false alarms: with that treshold almost everything you say will be understood as one or more of the keywords in your list. This is the answer to points 1 and 2 in my question.
About my 3rd point the answer is that hypothesis.getHypstr()
in onResult contains a concat of every possible match found. To discern from one match to another by looking at weights it should be possible to iterate over Segments: recognizer.getDecoder().seg()
(see here).
This is not ended up anyway. To implement a well performing recognizer one has to follow some rules in choosing keyphrases and then to perform treshold tuning. Like the CMU tutorial said: