I have a sample .webm file recorded using MediaRecorder in Chrome Browser. When I use Google speech java client to get transcription for the video, it returns empty transcription. Here is what my code looks like
SpeechSettings settings = null;
Path path = Paths.get("D:\\scrap\\gcp_test.webm");
byte[] content = null;
try {
content = Files.readAllBytes(path);
settings = SpeechSettings.newBuilder().setCredentialsProvider(credentialsProvider).build();
} catch (IOException e1) {
throw new IllegalStateException(e1);
}
try (SpeechClient speech = SpeechClient.create(settings)) {
// Builds the request for remote FLAC file
RecognitionConfig config = RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setUseEnhanced(true)
.setModel("video")
.setEnableAutomaticPunctuation(true)
.setSampleRateHertz(48000)
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
// RecognitionAudio audio = RecognitionAudio.newBuilder().setUri("gs://xxxx/gcp_test.webm") .build();
// Use blocking call for getting audio transcript
RecognizeResponse response = speech.recognize(config, audio);
List<SpeechRecognitionResult> results = response.getResultsList();
for (SpeechRecognitionResult result : results) {
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.getMessage());
}
If, I use the same file and visit https://cloud.google.com/speech-to-text/ and upload file in the demo section. It seems to work fine and shows transcription. I am clueless about whats going wrong here. I verified the request sent by demo and here it what looks like
I am sending the exact set of parameters, but that didn't work. Tried uploading file to Cloud storage but that too gave same result (no transcription).
After going through error and trials (and looking at the javascript samples), I could solve the issue. The serialized version of audio should be in FLAC format. I was sending the video file(webm) as is to Google Cloud. The demo on the site extracts audio stream using Javascript Audio API and then sents the data in base64 format to make it work.
Here are the steps that I executed to get the output.
Used FFMPEG to extract audio stream into FLAC format from webm.
ffmpeg -i sample.webm -vn -acodec flac sample.flac
The extracted file should be made available using either Storage cloud or send as ByteString.
Set the appropriate model while calling the speech API (for english language video
model works, while for french language command_and_search
). I don't have any logical reason for this. I realised it after trial and error with demo on Google cloud site.