
PocketSphinx for an Android dictation app

I'm trying to implement a "dictation" feature using the PocketSphinx on Android in conjunction with one of Keith Vertanen's language models. I've modified the sample to look like this:

private void setupRecognizer(File assetsDir) throws IOException {
 recognizer = defaultSetup()
     .setAcousticModel(new File(assetsDir, "en-us-ptm"))
     .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
     .setBoolean("-allphone_ci", true)
  File ngramModel = new File(assetsDir, "");
  recognizer.addNgramSearch(NGRAM_SEARCH, ngramModel);

where is from the 5K NVP 2-gram dowload on Keith Vertanen's site.

I'm getting this error:

1 18:04:29.861 2837-2863/? I/SpeechRecognizer: Load N-gram model /storage/emulated/0/Android/data/edu.cmu.sphinx.pocketsphinx/files/sync/
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(399): Trying to read LM in trie binary format
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(410): Header doesn't match
01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 103: Bad ngram count
01-31 18:04:29.862 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(489): Trying to read LM in DMP format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 500: Wrong magic header size number a5c6461: /storage/emulated/0/Android/data/edu.cmu.sphinx.pocketsphinx/files/sync/ is not a dump file
01-31 18:04:29.864 2837-2863/? E/AndroidRuntime: FATAL EXCEPTION: AsyncTask #1
                                                 Process: edu.cmu.sphinx.pocketsphinx, PID: 2837
                                                 java.lang.RuntimeException: An error occurred while executing doInBackground()
                                                     at android.os.AsyncTask$3.done(
                                                     at java.util.concurrent.FutureTask.finishCompletion(
                                                     at java.util.concurrent.FutureTask.setException(
                                                     at android.os.AsyncTask$SerialExecutor$
                                                     at java.util.concurrent.ThreadPoolExecutor.runWorker(
                                                     at java.util.concurrent.ThreadPoolExecutor$
                                                  Caused by: java.lang.RuntimeException: Decoder_setLmFile returned -1
                                                     at edu.cmu.pocketsphinx.PocketSphinxJNI.Decoder_setLmFile(Native Method)
                                                     at edu.cmu.pocketsphinx.Decoder.setLmFile(
                                                     at edu.cmu.pocketsphinx.SpeechRecognizer.addNgramSearch(
                                                     at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.setupRecognizer(
                                                     at edu.cmu.pocketsphinx.demo.PocketSphinxActivity.access$000(
                                                     at edu.cmu.pocketsphinx.demo.PocketSphinxActivity$1.doInBackground(
                                                     at edu.cmu.pocketsphinx.demo.PocketSphinxActivity$1.doInBackground(
                                                     at android.os.AsyncTask$
                                                     at android.os.AsyncTask$SerialExecutor$ 
                                                     at java.util.concurrent.ThreadPoolExecutor.runWorker( 
                                                     at java.util.concurrent.ThreadPoolExecutor$ 

The lines

01-31 18:04:29.861 2837-2863/? I/cmusphinx: INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
01-31 18:04:29.862 2837-2863/? E/cmusphinx: ERROR: "ngram_model_trie.c", line 103: Bad ngram count

make me think that the file isn't correctly formatted or something. The file looks like this:

ngram 1=5000
ngram 2=4331397
ngram 3=0

-2.11154    </s>    0
-99 <s> -3.13167
-0.3954594  <unk>   -0.4365645
-2.271447   a   -2.953606
-3.384721   a.  -1.85196
-5.788997   a.'s    -0.8137056
-4.139672   abandoned   -0.9728376
-3.904189   ability -1.838658
-4.360272   able    -2.161723

which at least looks like the example file here.

My only other thought was that perhaps the extension is wrong, since this says

Language model can be stored and loaded in three different format - text ARPA format, binary format BIN and binary DMP format. ARPA format takes more space but it is possible to edit it. ARPA files have .lm extension. Binary format takes significantly less space and faster to load. Binary files have .lm.bin extension. It is also possible to convert between formats. DMP format is obsolete and not recommended.

which makes it sound like the file should be named lm_csr_5k_nvp_2gram.lm instead of I did try renaming the file, however, without any change in the exception.

What is the correct way to do this?


  • Well, this is an issue with model format, this line in ngram model causes a problem:

    ngram 3=0

    You can either remove offending line or update pocketsphinx-android-demo, I've just pushed a new version with this issue fixed.

    Overall, dictation on the phone is not trivial because phone is really slow. I do not recommend you to use 2-gram, it is better to use heavily pruned 3-gram model. You can prune with srilm.

    You can also read optimization doc to learn what else to tune.