speech-recognitionspeech-to-textgoogle-speech-apigoogle-speech-to-text-api

google speech to text not working correctly with very short audio (single words)


I'm testing google Speech-to-Text API with streaming audio as well as with wav files. I'm using audio from telephony: 8000 sample rate, 8bits, mulaw encoding. The Google configuration is set appropriately.

When I test it with normal sequences, it returns a correct transcription. However when I say a single word (especially a number), I'm very often obtaining no response from the api -> as if it would be no input. This occurrence happens for both streaming as well as batch transcription.

does anybody know why is this happening? how to fix it?


Solution

  • The Cloud Speech-to-Text API best practices suggest using a lossless codec like FLAC or LINEAR16. I verified with LINEAR16 and it works for single words which are digits. So the solution would be to transcode the audio.