javaaudioffmpegspeech-recognitioncmusphinx

Convert audio files for CMU Sphinx 4 input


I have a big batch of files I'd like to run recognition on using CMU Sphinx 4. Sphinx requires the following format:

My files are something like 44100 khz, 32 bit stereo mp3 files. I tried using Tritonus, and then its updated version JavaZoom, to convert using code from bakuzen. However, AudioSystem.getAudioInputStream(File) throws an UnsupportedAudioFileException, and I haven't been able to figure out why, so I have moved on.

Now I am trying ffmpeg. The command ffmpeg -i input.mp3 -ac 1 -ab 16 -ar 16000 output.wav seems like it should do the trick (except for little endian), but when I check the output with Audacity, it still labels it as "32-bit float". The command I found on this site also uses -acodec pcm_s16le, which from its name seems to be outputting 16 bit little endian; however, Audacity still tells me the output is 32 bit float.

Can anyone tell me how to convert audio files into the format required by CMU Sphinx 4?


Solution

  • Did you actually try the output from ffmpeg in CMU Sphinx 4? 32-bit float is probably your default sampling format in Audacity (Edit > Preferences > Quality). I'm guessing it converts any imported file to these settings, so it may not be reporting the parameters of the actual file, but perhaps the working file in Audacity.

    Remove -ab 16. This would instruct the encoder to use 16 bits/s and ffmpeg will ignore it for pcm_s16le anyway. So your command will look like:

    ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav
    

    To convert all mp3 files in a directory in Linux:

    for f in *.mp3; do ffmpeg -i "$f" -acodec pcm_s16le -ac 1 -ar 16000 "${f%.mp3}.wav"; done
    

    Or Windows:

    for /r %i in (*) do ffmpeg -i %i -acodec pcm_s16le -ac 1 -ar 16000 %i.wav
    

    In Windows Batch file:

    for /r %%i in (*.mp3) do ffmpeg -i "%%i" -acodec pcm_s16le -ac 1 -ar 16000 "%i.wav"
    

    You can see file information with file, ffmpeg, ffprobe, mediainfo among other utilities:

    $ file hjl0bC.wav 
    hjl0bC.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
    
    $ ffmpeg -i hjl0bC.wav
    [...]
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s