audiospeech-recognitionaudio-streamingspeech-to-textweb-mediarecorder

Implementing a real time speech recognition using web Media Recorder API in React for the Front-End and Python for back-end


What we're trying to implement?

we deployed an AI model to stream the audio from microphone and display the text of the speech to the user. something like this.

What technologies are used?

What's the problem though?

In the front-end, I try to send audio chunks every second as an Int16Array to the back-end. also to make sure everything related to the mic and audio works fine, after stop recording I can download the first chunk of the audio only with duration of 1s which is pretty clear. However, when the audio is sanded to the backend it becomes to some bunch of noise!

Here's the part of the React code when the recording is getting handle:

        useEffect(()=> {
      if (recorder === null) {
        if (isRecording) {
          requestRecorder().then(setRecorder, console.error);
        } else {
          return;
        }
      }
  
      // Manage recorder state.
      if (isRecording && recorder) {
        recorder.start();
      } else if (!isRecording && recorder) {
        recorder.stop();
      }
 
    // send the data every second
    const ineterval = setInterval(() => {
      if (recorder) {
        recorder.requestData();
      }
      }, 1000);

    // Obtain the audio when ready.
    const handleData = e => {
      setAudioURL(URL.createObjectURL(e.data));
      let audioData = []
      audioData.push(e.data)
      const audioBlob = new Blob(audioData, {'type' : 'audio/wav; codecs=0' })
      
      const instanceOfFileReader = new FileReader();
      instanceOfFileReader.readAsArrayBuffer(audioBlob);


      instanceOfFileReader.addEventListener("loadend", (event) => {
      console.log(event.target.result.byteLength);
      const arrayBuf = event.target.result
      const int16ArrNew = new Int16Array(arrayBuf, 0, Math.floor(arrayBuf.byteLength / 2));

            
      setJsonData(prevstate => ({...prevstate, 
      matrix: int16ArrNew,}))
      })

    };
    if (recorder) {
      recorder.addEventListener("dataavailable", handleData);
    }
    return () => {
      if (recorder) {
        recorder.removeEventListener("dataavailable", handleData)
        clearInterval(ineterval)
      }
  };
    }, [recorder, isRecording])

Is there anyone faced this issue before? had a lot of research about it but found nothing to fix this.


Solution

  • Just checked this question and smiled:)) ..last year this was a real nightmare for me :))..so just put the name of the library for anyone who will see it in the future. To achieve real-time transition no matter what you will need webRTC. For real-time recording you can simply use the recordRTC package and install it using npm. There isn't many configurations and it's completely straightforward.