I am working on a web speech recognition application. I am using recorderJS to capture the sound and send it to the backend where it should be processed using CMU Sphinx.
I've had accuracy problems while discovering the library with the latest version, 5prealpha, using the default acoustic model, language model and dictionary and later reducing the number of recognized words by using a JSGF grammar, so I used the 1.0 beta6 version.
The microphone recognition with the 1.0 beta6 version is pretty accurate. However, when I transcribe the sound it's always poor. How can I improve the accuracy? I tried using the StreamSpeechRecognizer with the latest version, but it also gives poor results.
I managed to get a good accuracy. I checked the implementation of the edu.cmu.sphinx.frontend.util.Microphone class and I found out that the sample rate was 16000, the bit rate is 16 and the number of channels equals 1.
I looked further in the recorderJS and I found out that in Google Chrome the sample rate was 44,100 Hz, so I looked for a configurable version of the library and I found Chris Rudmin fork of Matt Diamond's RecorderJS.
I didn't use the latest version because the sound is exported in the Ogg format, and I need it to be WAV, so I looked in previous releases; I used version 0.3 where the bit rate is configurable and it worked fine.
I later modified the example that comes with it and the following parameters gave a good accuracy:
monitor gain: 0
bitDepth: 16
number of channels: 1
recordOpus: unchecked
sample rate: 16000
bit rate: 32000
This is the configuration of the stream data source in CMU Sphinx's XML configuration file.
<component name="streamDataSource"
type="edu.cmu.sphinx.frontend.util.StreamDataSource">
<property name="sampleRate" value="16000" />
<property name="bitsPerSample" value="16" />
<property name="bigEndianData" value="false" />
<property name="signedData" value="true" />
</component>