is there any reliable and consistent method to detect a pause of more than 2 seconds in AssemblyAi
realtime transcript.
https://www.assemblyai.com/docs/guides/real-time-streaming-transcription
currently my implementation is this.
if(currentFinalTranscript.audio_start - previousFinalTranscript.audio_end >= 2000){
console.log("pause greater than 2 seconds detected");
}
this works well when the pause duration is 5000
instead of 2000
.
but the more near i go to 2000 it starts getting unreliable.
i am assuming that the audio_start
and audio_end
in the realtimeTranscript response excludes the milliseconds where the person was silent.
as this seems to be the case when the pause is longer.
currentFinalTranscript is the finalTranscript recieved in the new
socket.onMessage
previousFinalTranscript is the finalTranscript recieved in the preceeding
socket.onMessage
callback
other logical approaches or flaws in the current logic are welcomed.
I have successfully tackled this issue. The logic was to check for a specific number of empty Partial Transcript. If the number of concurrent empty Partial Transcript exceeds the threshold number. Then i declare that a pause is detected.
Then i can adjust the threshold number to approximately guess the time of silence.
The solution is quite obvious. Just at that time i didn't think it would work.