speech-recognitionsamplingwavemicrosoft-speech-platform

Microsoft Speech Platform - sampling rate and bit depth


Recognition results are best if sampling rate and bit depth of the audio match the training data of the system.

So, does anyone know the exact sampling rate and/or bit depth (and/or stereo/mono) that is used in Microsoft Speech Platform (newest, if that's important)? And if so, do you remember where you got this information?

Please note that I am using the MS Speech Platform, not the SAPI. Unless both are using the same training data, that's not the same AFAIK. To be precise - I use this: http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.speechrecognitionengine.setinputtowavefile%28v=office.14%29.aspx

My first try is based upon the C++ code example given on the page.


Solution

  • The Microsoft.Speech SR engine doesn't need training (unlike the System.Speech SR engine), and is relatively insensitive to sampling rate (will work with anything > 8 KHz sampling rate). 16 bit audio is preferred, but I believe that it will work with 8 bit audio.