I have a WAV file generated from a stream using WebRTC. The sample demo here is able to transcribe it with results but my code is failing to do so as I'm getting an empty response. Here's my config:
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.OGG_OPUS,
sample_rate_hertz=48000,
language_code="es-US",
audio_channel_count=2,
enable_separate_recognition_per_channel=True,
use_enhanced=True,
model="command_and_search"
)
I tested your file using the configuration you provided and I get blank results as well. I'm not sure what code or version of the API on the backend of Try it demo uses that makes your audio file work seamlessly. But what I did as a workaround is I converted your file to FLAC and it worked.
To convert the file I used FFMPEG. You can use any audio converter tool that you have as long as it properly converts it to FLAC. See command:
ffmpeg -i hola.wav hola.flac
Using the converted file I changed the audio encoding in the config to flac and it worked fine. See code below:
def transcribe_file(speech_file):
from google.cloud import speech
import io
client = speech.SpeechClient()
with io.open(speech_file, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=48000,
audio_channel_count=2,
language_code="en-US",
model="command_and_search"
)
response = client.recognize(config=config, audio=audio)
print(response)
for result in response.results:
print(u"Transcript: {}".format(result.alternatives[0].transcript))
transcribe_file("./hola.flac")
Output:
Also for reference, when empty result is encountered and you have tried to optimize the audio (split into mono) and still fails. Try converting the file to FLAC as suggested by troubleshooting docs.