nlpspeech-recognitioncmusphinxsphinx4language-model

ARPA language model documentation


Where can I find documentation on ARPA language model format?

I am developing simple speech recognition app with pocket-sphinx STT engine. ARPA is recommended there for performance reasons. I want to understand how much can I do to adjust my language model for my custom needs.

All I found is some very brief ARPA format descriptions:

I am beginner to STT and I have trouble to wrap head around this (n-grams, etc...). I am looking for more detailed docs. Something like documentation on JSGF grammar here:

http://www.w3.org/TR/jsgf/


Solution

  • There is actually not much more to say about the format than is said in those docs..

    Besides, you'll probably want to prepare a text file with sample sentences and generate the language file based on it. There is an online version which can do it for you: lmtool