javascriptartificial-intelligencespeech-recognitiontranscription

Detecting a pause of 2 Seconds or more in Speech


is there any reliable and consistent method to detect a pause of more than 2 seconds in AssemblyAi realtime transcript.

https://www.assemblyai.com/docs/guides/real-time-streaming-transcription

currently my implementation is this.

if(currentFinalTranscript.audio_start - previousFinalTranscript.audio_end  >= 2000){
    console.log("pause greater than 2 seconds detected");
}

this works well when the pause duration is 5000 instead of 2000. but the more near i go to 2000 it starts getting unreliable.

i am assuming that the audio_start and audio_end in the realtimeTranscript response excludes the milliseconds where the person was silent. as this seems to be the case when the pause is longer.

currentFinalTranscript is the finalTranscript recieved in the new socket.onMessage

previousFinalTranscript is the finalTranscript recieved in the preceeding socket.onMessage callback

other logical approaches or flaws in the current logic are welcomed.


Solution

  • I have successfully tackled this issue. The logic was to check for a specific number of empty Partial Transcript. If the number of concurrent empty Partial Transcript exceeds the threshold number. Then i declare that a pause is detected.

    Then i can adjust the threshold number to approximately guess the time of silence.

    The solution is quite obvious. Just at that time i didn't think it would work.