androidspeech-recognitionpocketsphinxpocketsphinx-android

pocketsphinx android - returning same word for every different spoken word


I have downloaded the pocketsphinx-android-demo from Github and did some modifications for my own purpose.

I have created a new my-en-us.dict (dictionary) file for my app specific words and added below words

hey HH EY
smarty S M AA R T IY

login L AA G IH N

Then, I have created a login.gram (Grammer) file with below code

#JSGF V1.0;

grammar login;

public <item> = login;

Then, I did initialization of recognizer in my Activity like this

public static final String KWS_SEARCH = "wakeup";
public static final String LOGIN_SEARCH = "login";

/* Keyword we are looking for to activate menu */
public static final String KEYPHRASE = "hey smarty";

private void setupRecognizer(File assetsDir) throws IOException {
    recognizer = SpeechRecognizerSetup.defaultSetup()
                .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                .setDictionary(new File(assetsDir, "my-en-us.dict"))
                .getRecognizer();
    recognizer.addListener(this);

    recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

    File loginGrammar = new File(assetsDir, "login.gram");
    recognizer.addGrammarSearch(LOGIN_SEARCH, loginGrammar);
}

Rest of the code is same as pocketsphinx-android-demo for starting recognizer, listening to words, etc.

After launching the android app, I said "hey smarty" to activate recognition for "login" word. When I say "login", it returns "login" but when I say any other word like "hello", "settings", etc., it only returns "login".

I don't know why this is happening. Am I doing something wrong, if yes, then what is the correct way to add only specific words for accurate recognition?

Another question is, how do I check the accuracy percentage of the "partial result" or "result"?


Solution

  • Presumably you changed the implementation of onPartialResult() to handle a switchSearch(LOGIN_SEARCH) as well.

    The hypothesis is continually "login" because that's the only word you have in your grammar. Other words ("hello", "settings") are probably being misinterpreted as "login" because that keyword has no "kws-threshold" associated with it.

    For this use case, you want to use addKeywordSearch() instead of a grammar. It is much like addKeyPhraseSearch(), but lets you use multiple keywords, each with their own thresholds:

    File f = new File( context.getCacheDir(), "temp.gram" );
    PrintWriter p = new PrintWriter( f );
    p.print(
     "hello/1e-10/\n" +
     "login/1e-10/\n" +
     "settings/1e-10/\n"
    );
    p.close();
    recognizer.addKeywordSearch( LOGIN_SEARCH, f );
    

    (I've used a PrintWriter here because addKeywordSearch() requires a file).

    As I mention in this answer, the threshold values will vary for each keyword, and are usually found through experimentation. The values I've provided are notional.

    That should answer your second question, as well: You don't have to check an accuracy percentage (I don't think PocketSphinx even provides one, for keywords) because the threshold is effectively doing that for you.

    Naturally, all your keywords must appear in the dictionary as well.