javascripthtmlspeech-to-textprivacywebspeech-api

Does the Web Speech API in Chrome (and Edge) use an offsite server for STT?


Recently I found the Web Speech API as well as a simple HTML/JS speech-to-text example from Google.

I started playing with the API and made changes to the JS and HTML in the example above to see how it performed. There does not appear to be any network behavior in the debugging console, however, since speech-to-text is a native API, I wonder if the browser itself makes any off-site request to a third-party server for the purpose of converting speech to text, or if it truly is built into the browser directly and would function in a completely offline environment.

Question: Is the web speech API completely private such that all voice-to-text conversion happens on a local machine, or does it make remote requests?

(I realize that this is, perhaps, only tangentially related to programming in JS/HTML... so if it is OT then please point me to where on SE that this question should be asked so I can close and move it.)


Solution

  • The answer for your question can be found in "Where does the audio go?" section in https://wiki.mozilla.org/Web_Speech_API_-_Speech_Recognition. Firefox can specify which server receives the audio data inputted by the users. Currently we are sending audio to Google’s Cloud Speech-to-Text. Google leads the industry in this space and has speech recognition in 120 languages.

    Even though this web page spoke about Firefox, it was in general for Web Speech API