mp3speech-recognitioncmusphinxsphinx4

mp3 recognition using Sphinx 4


Can we use mp3 files for the voice recognition process without using wav files? or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy? The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?


Solution

  • Can we use mp3 files for the voice recognition process without using wav files?

    Not directly. To be able to recognize mp3 streams, you need to use java library to read mp3 and convert to pcm stream (tritonus-mp3, lameonj). You can also invoke ffmpeg as a separate process to decode.

    or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy?

    Accuracy is affected in both cases, no matter where you decode mp3 file.

    The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?

    It's better to use losseless codec like flac for transfer. mp3 conversion degrades ASR accuracy. Another approach would be to calculate features on the client and transfer them to the server.