android speech-recognition cmusphinx pocketsphinx pocketsphinx-android

How to setup tresholds to spot keywords from a list in pocketsphinx-android?

I would like my Android application to do continuous keywords spotting. I'm modifying the pocketsphinx android demo to test how I can do it. I wrote this list in a file named en-keywords.txt picking words from cmudict-en-us.dict:

rainbow /1e-50/
about /1e-50/
blood /1e-50/
energies /1e-50/

In setupRecognizer method I removed every search and added to the recognizer only this keyword search:

File keywords= new File(assetsDir, "en-keywords.txt");
        recognizer.addKeywordSearch(KWS_SEARCH, keywords);

Finally I modified onPartialResult like this:

public void onPartialResult(Hypothesis hypothesis) {
        if (hypothesis == null)
            return;

        String text = hypothesis.getHypstr();

        switchSearch(KWS_SEARCH);
    }

so that every time a partial result is found with a not null hypotesis the onResult is called and the search starts again.

What I see in the app running is not what I'm expecting:

onPartialResult has a not null hypotesis every time I speak also if I say something very different from what I'm looking for;
also if I say "hey" onPartialResult hypotesis is often composed by more than one word; worst case I say "hey" and the method understand "rainbow about energies blood"
onResult method is then called but it prints a Toast with a text different from the last found by onPartialResult; like if it was a concat of strings done in some not trivial order.

I tried with different tresholds for keywords but I didn't find my way... Probably I'm missing some basic concept or some configuration parameter... Can someone help me on this?

Solution

Definitely the solution is to understand how thresholds work and to tune them correctly. I read from sourceforgeforum that the higher the treshold (max 1) the less false alarm (with the risk of missing true matches) and viceversa (min 1e-50). Pocketsphinx code will use your threshold and return a match if the weight of a possible recognition is greater or equal to your threshold: giving a keyphrase a threshold of 1 means you want to have that keyphrase in the result only if pocketsphinx is absolutely sure of what has been spoken.

I was using 1e-50 which is a very low treshold that leads to a lot of false alarms: with that treshold almost everything you say will be understood as one or more of the keywords in your list. This is the answer to points 1 and 2 in my question.

About my 3rd point the answer is that hypothesis.getHypstr() in onResult contains a concat of every possible match found. To discern from one match to another by looking at weights it should be possible to iterate over Segments: recognizer.getDecoder().seg() (see here).

This is not ended up anyway. To implement a well performing recognizer one has to follow some rules in choosing keyphrases and then to perform treshold tuning. Like the CMU tutorial said:

For the best accuracy it is better to have keyphrase with 3-4 syllables;
Too short phrases are easily confused.