speech-recognitionspeech-to-textcmusphinxfreetts

Building a Phonetic Dictionary with CMUSphinx for a Speech to Text System


Am trying to build a Speech to Text system for a native language, specific to a particular domain. Though of using CMUSphinx for the purpose. For an uncommon language, as I understand first you would need to build the phonetic dictionary which includes the English Transliteration for the possible set of words:

uniocode word -> english transliteration

ex.:

xxxx -> ah ty re see

My question is, do we need to create this transliteration manually? Came across freetts [2] which seems to work well for English. How can I do the same for a new language?


Solution

  • Possible ways to build a dictionary are covered in CMUSphinx tutorial:

    http://cmusphinx.sourceforge.net/wiki/tutorialdict

    There are various tools to help you to extend an existing dictionary for new words or to build a new dictionary from scratch. If your language already has a dictionary it's recommended to use since it's carefully tuned for best performance. If you starting a new language you need to account for various reductions and coarticulations effects. They make it very hard to create accurate rules to convert text to sounds. However, the practice shows that even naive conversion could produce a good results for speech recognition. For example, many developers were successful to create ASR with simple grapheme-based synthesis where each letter is just mapped to itself not to the corresponding phone.

    For most of the languages you need to use specialized grapheme to phoneme (g2p) code to do the conversion using machine learning methods and existing small database. Nowdays most accurate g2p tools are Phonetisaurus and sequitur-g2p.

    Also note that almost each TTS package has G2P code included. For example you can use g2p code from FreeTTS, OpenMary or espeak.

    Please note that if you use TTS you often need to do phoneset conversion. TTS phonesets are usually more extensive than required for ASR. However, there is a great adavantage in TTS tools because they usually contain more required functionality than simple G2P. For example, they are doing tokenization by converting numbers and abbreviations to spoken format.