androidspeech-recognitioncmusphinxpocketsphinxpocketsphinx-android

How to setup tresholds to spot keywords from a list in pocketsphinx-android?


I would like my Android application to do continuous keywords spotting. I'm modifying the pocketsphinx android demo to test how I can do it. I wrote this list in a file named en-keywords.txt picking words from cmudict-en-us.dict:

rainbow /1e-50/
about /1e-50/
blood /1e-50/
energies /1e-50/

In setupRecognizer method I removed every search and added to the recognizer only this keyword search:

File keywords= new File(assetsDir, "en-keywords.txt");
        recognizer.addKeywordSearch(KWS_SEARCH, keywords);

Finally I modified onPartialResult like this:

public void onPartialResult(Hypothesis hypothesis) {
        if (hypothesis == null)
            return;

        String text = hypothesis.getHypstr();

        switchSearch(KWS_SEARCH);
    }

so that every time a partial result is found with a not null hypotesis the onResult is called and the search starts again.

What I see in the app running is not what I'm expecting:

  1. onPartialResult has a not null hypotesis every time I speak also if I say something very different from what I'm looking for;
  2. also if I say "hey" onPartialResult hypotesis is often composed by more than one word; worst case I say "hey" and the method understand "rainbow about energies blood"
  3. onResult method is then called but it prints a Toast with a text different from the last found by onPartialResult; like if it was a concat of strings done in some not trivial order.

I tried with different tresholds for keywords but I didn't find my way... Probably I'm missing some basic concept or some configuration parameter... Can someone help me on this?


Solution

  • Definitely the solution is to understand how thresholds work and to tune them correctly. I read from sourceforgeforum that the higher the treshold (max 1) the less false alarm (with the risk of missing true matches) and viceversa (min 1e-50). Pocketsphinx code will use your threshold and return a match if the weight of a possible recognition is greater or equal to your threshold: giving a keyphrase a threshold of 1 means you want to have that keyphrase in the result only if pocketsphinx is absolutely sure of what has been spoken.

    I was using 1e-50 which is a very low treshold that leads to a lot of false alarms: with that treshold almost everything you say will be understood as one or more of the keywords in your list. This is the answer to points 1 and 2 in my question.

    About my 3rd point the answer is that hypothesis.getHypstr() in onResult contains a concat of every possible match found. To discern from one match to another by looking at weights it should be possible to iterate over Segments: recognizer.getDecoder().seg() (see here).

    This is not ended up anyway. To implement a well performing recognizer one has to follow some rules in choosing keyphrases and then to perform treshold tuning. Like the CMU tutorial said:

    1. For the best accuracy it is better to have keyphrase with 3-4 syllables;
    2. Too short phrases are easily confused.