react-nativespeech-recognitionreact-native-voice

React-Native-voice for speech recognition, how can users be provided with additional pause time?


How can we extend the speech recognition listening time in React-Native-voice to accommodate users who need more time to think before verbalizing their thoughts, preventing premature stopping of speech recognition when they pause but haven't completed their intended message?


Solution

  • If your app is on the Android platform, you can adjust the following constant: EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS to increase the time it takes for the recognizer to consider the input complete and end the recognition session after stopping hearing speech RecognizerIntent.

    To make this change in React Native (RN), follow these steps:

    1. Locate the file react-native-voice/src/index.js.

    2. In this file, find the following code snippet:

      if (Platform.OS === 'android') {
      Voice.startSpeech(
        locale,
        Object.assign(
          {
            EXTRA_LANGUAGE_MODEL: 'LANGUAGE_MODEL_FREE_FORM',
            EXTRA_MAX_RESULTS: 5,
            EXTRA_PARTIAL_RESULTS: true,
            REQUEST_PERMISSIONS_AUTO: true,
          },
          options,
        ),
        callback,
      );
      

      } else { Voice.startSpeech(locale, callback); }

    3. In the Object.assign() function, you can add the parameter of EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS and set the value to the desired time in milliseconds. For example, if you want to set it to 5000 milliseconds, the updated code would look like this: EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 5000,

    By making this adjustment, you can control how long the speech recognition should wait for silence before considering the input complete, allowing users more time to pause and think before ending the recognition session.

    While attempting to extend the listening duration for speech recognition, you may encounter an issue where the app continues to listen for an additional 5 seconds after silence but eventually crashes. If this problem occurs, you may notice that the _onSpeechResults() function returns a null matches object, resulting in a null speech result, which is likely the cause of the app crash:

    Interestingly, even with the null matches in _onSpeechResults(), the _onSpeechPartialResults() function still receives multiple messages containing partial results.

    For example, the following is a screenshot of the results captured in the

    Figure 1: The captured partial results from _onSpeechPartialResults()

    In the _onSpeechPartialResults, we can see that the end of speaking has been reached, even if the response from the _onSpeechResults is null. Also, I noticed that the most complete chunk of the speech is always after the onSpeechEnd event. Therefore, instead of relying solely on the null result provided by onSpeechResults, we can now gather and join all the collected chunks of partial results received after each onSpeechEnd event.

    Therefore, as shown below, I wrapped the onSpeechResults() function in a null check, and concatenated partial results from _onSpeechPartialResults together in a list to form a more complete transcript of the speech, ensuring that the final result is more accurate and reliable.

    const messageParts = [];
    
    Voice.onSpeechPartialResults = (r) => {
      console.log('Partial Results ' + JSON.stringify(r));
      if (speechEnded) {
        messageParts.push(r.value[0])
      }
      speechEnded = false;
    }
    
    Voice.onSpeechResults = (res) => {
      console.log('speech results: '+JSON.stringify(res));
      let speech = res.value[0];
      if (speech == null) {
        speech = messageParts.join('');
      }
    }