android-speech-api

NetworkSpeechRecognize vs SodaSpeechRecognizer


In my attempts to troubleshoot a challenging SpeechRecognizer.java issue, I noticed in Logcat messages TAGs that are not SpeechRecognizer's:

12:34:00.933 NetworkSpeechRecognizer          pid-29210  I  Online recognizer - start listening
12:34:00.936 SodaSpeechRecognizer             pid-29210  I  Offline recognizer - start listening
12:34:00.936 SodaSpeechRecognizer                        I  Initialize Soda [locale: en-US]
12:34:00.963 SodaSpeechRecognizer                        I  Initialize Soda with language pack directory
12:34:01.099 SodaSpeechRecognizer                        I  Offline recognizer - start detection
12:34:02.975 NetworkSpeechRecognizer                     I  #cancel
12:34:02.976 NetworkSpeechRecognizer          pid-29210  I  #cancel
12:34:02.976 NetworkSpeechRecognizer                     I  #failWithException
12:34:02.977 SodaSpeechRecognizer                        I  startDetection successful
12:34:02.979 NetworkSpeechRecognizer                     W  Recognizer network error
12:34:02.979 SodaSpeechRecognizer                        I  Offline recognizer - stop detection

Update: I just found this article SODA: Speech On-Device API along with this page. Nice clue, but insufficient as this is not official documentation.

The following official documentation from Google seems to be much better: https://cloud.google.com/speech-to-text/priv/docs/ondevice-overview

Yet, it is still mysterious, quoting:

This product is a private feature. The documentation is publicly available but you must contact Google for full access.


Solution

  • Attempting to answer my questions (until a better answer comes along):

    1. SodaSpeechRecognizer is the on-device speech recognizer (SODA is an acronym for "Speech On-Device API"). It is unclear yet when it was introduced. Could this be with the introduction of com.google.android.googlequicksearchbox(Android 4.1 AKA "Jelly Bean", released in June 2012)?
    2. The best way to learn about it is at https://cloud.google.com/speech-to-text/priv/docs/ondevice-overview . One must contact Google for full access, though.
    3. Those tags were likely defined in a C++ module name soda_async_impl.cc, as it is the only one visible on Logcat that produces SODA related messages:
    native                    W  W0206 12:01:28.459973   32167 soda_async_impl.cc:320] Creating soda_impl.
                              W  W0206 12:01:28.460534   32167 soda_async_impl.cc:322] Created with thread priority 10
    native                    W  W0206 12:01:29.632099   32167 soda_async_impl.cc:484] SODA session starting (require_hotword:0, hotword_timeout_in_millis:0, trigger_type:TRIGGER_TYPE_UNSPECIFIED, hybrid_asr_config.mode:MODE_DEFAULT)
    native                    W  W0206 12:01:29.634873   32200 soda_async_impl.cc:1203] SODA received first mic audio buffer, size in bytes: 320, format: 1, channels: 1, : sample rate: 16000
    native                    W  W0206 12:01:30.154026   32200 soda_async_impl.cc:959] Not receiving any loopback audio in 500ms. Last audio received time: 0, Current time in us: 1675677690154022
    native                    W  W0206 12:01:33.162399   32200 soda_async_impl.cc:861] SODA stopped processing audio, mics audio processed in millis: 4880, loopback audio processed in millis: 0
                              W  W0206 12:01:33.265670   32200 soda_async_impl.cc:916] SODA session stopped due to: MIC_END_OF_DATA
                              W  W0206 12:01:33.319812   32171 soda_async_impl.cc:987] Deleting soda_impl