I am creating an Unity WebGL build and I have to use Google Cloud Speech To Text API in my app. Unity does not support microphones in WebGL builds, but there is a workaround using jslib files that accesses the microphone through a Javascript code. The problem is, no matter what I try or where I look across the web, there is no documentation on how can I submit this data for processing in Google Cloud using an HTTP POST request using plain Javascript (since I cannot easily use other methods or libraries and don't want this code to be too complicated). I created an API key (might not be necessary...) but all the requests I try sending such as the code below, returns Error Code 400 Bad Request or similar:
fetch("https://speech.googleapis.com/v1/speech:recognize?key=API_KEY", {
method: "POST",
body: JSON.stringify(payload),
headers: {
"Content-Type": "application/json"
}
})
.then(response => response.json())
.then(data => {
// 3. Process the response
processResponse(data);
})
.catch(error => {
console.error('Error:', error);
});
Heck, I even tried asking ChatGPT 4 and got no answer. I admit I am not a Javascript person let alone an expert, so if anyone is familiar with creating such requests, please share your knowledge with me. Thank you!
EDIT: Since it appears to be somewhat unclear, this is the full code (I don't care about conventions or styling at the moment, I need the core functionality to work first):
StartRecording: function () {
console.log("Beginning of StartRecording");
// Function to send audio data to Google Cloud Speech-to-Text API
var sendToSpeechToText = function (blob) {
console.log("Beginning of SendToSpeechToText");
const apiKey = '<REMOVED>'; // Replace with your Google Cloud API key
const url = `https://speech.googleapis.com/v1/speech:recognize?key=${apiKey}`;
const reader = new FileReader();
reader.onload = function() {
const base64data = reader.result;
const audioBytes = base64data.split('base64,')[1];
const requestData = {
config: {
encoding: 'WEBM_OPUS',
sampleRateHertz: 16000,
languageCode: 'en-US'
},
audio: {
content: audioBytes
}
};
fetch(url, {
method: 'POST',
body: JSON.stringify(requestData),
headers: {
'Content-Type': 'application/json'
}
})
.then(response => response.json())
.then(data => {
console.log("Data Received!");
// Process the response data (transcript)
window.alert(data["results"]["0"]["alternatives"]["0"]["transcript"]);
})
.catch(error => console.error('Error:', error));
};
console.log("End of SendToSpeechToText");
reader.readAsDataURL(blob);
};
var handleSuccess = function(stream) {
console.log("Beginning of HandleSuccess");
const options = {
mimeType: 'audio/webm'
};
const recordedChunks = [];
const mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.addEventListener('dataavailable', function(e) {
if (e.data.size > 0) recordedChunks.push(e.data);
});
mediaRecorder.addEventListener('stop', function() {
sendToSpeechToText(new Blob(recordedChunks));
});
mediaRecorder.start();
// For example, stop recording after 5 seconds
setTimeout(() => {
mediaRecorder.stop();
}, 5000);
console.log("End of HandleSuccess");
};
navigator.mediaDevices.getUserMedia({ audio: {
deviceId: "default",
sampleRate: 16000,
sampleSize: 16,
channelCount: 1
}, video: false })
.then(handleSuccess);
console.log("End of StartRecording");
}
I also tried adding Authorization: 'Bearer ${apiKey}'
to the headers instead of supplying the API Key in the url, but same result.
I'm leaving it here just in case someone will face this use case in the future:
While I could not find the answer I wanted, I did find a workaround in the form of using a local NodeJS server. It produces an additional layer of complexity (and another service that has to be maintained) but it gave me the ability to perform the task I wanted.
I just post the request to the NodeJS local server, it reads the base64 encoded audio data and parameters for the Google Cloud request, generates an API key using a service account I set up, sends and awaits the request to Google Cloud Speech To Text API for processing. When a response is received, it just propagates it back as a response to the original post request.