audiogoogle-cloud-platformgoogle-speech-apigoogle-voice-search

What audio file types does Google Cloud Speech API recognize?


I'm trying to use Google's Cloud Speech API. There's documentation and code examples here:

https://cloud.google.com/speech/docs/basics
https://cloud.google.com/speech/docs/rest-tutorial

I can get the sample code to run just fine if I point it to an included file, audio.raw, but not with a brief .wav file.

I have no idea what format the audio sample file is:

$ file audio.raw 
audio.raw: data

With my .wav file that has maybe 10 seconds of audio I get an empty result.

I'm aware of this answer.

google cloud speech api returning empty result

My question was asked before but there was not an answer to the question.

What types of audio are supported by Cloud Speech API?

I can't imagine that I would have to get the properties of the audio file just right to get this to work. I assume a common use case, mine, is that someone records a meeting, has no idea of the parameters of the recording and just wants a text file.


Solution

  • EDIT May 2020: seems things improved and this answer is no longer correct: see new docs for details about supported formats (including WAV).


    As of 2016 the WAVe format does not seem to be supported. These formats are documented as supported though:

    https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig#AudioEncoding