javascripthtml5-audiospeech-to-textamazon-polly

How to highlight the audio playing words in Amazon Polly


I'm trying to implement amazon Polly in an application. It is an MVC application. I was able to retrieve the audio from the text and it works fine. And am trying to highlight the respective text in the webpage while playing that audio. like the rear speaker does. My aim is to implement it without a third-party application. I went through the documentation but I can't find anything useful. I didn't find any options that automatically highlight the text in amazon Polly itself.

Can we do anything with Speech marks for this? is there any way to do this?

Thanks in Advance :)

Edit

I have the speech mark JSON result. Now am stuck on how to sync this result with an HTML audio tag.

{"time":6,"type":"word","start":0,"end":2,"value":"Hi"}
{"time":587,"type":"word","start":4,"end":6,"value":"my"}
{"time":754,"type":"word","start":7,"end":11,"value":"name"}
{"time":1147,"type":"word","start":12,"end":14,"value":"is"}
{"time":1305,"type":"word","start":15,"end":19,"value":"John"}

Solution

  • I have the speech mark JSON result. Now am stuck on how to sync this result with an HTML audio tag.

    You can use the timeupdate event on and audio element and sync the audio.currentTime property with the best matching speech mark from Polly.

    This assumes that audio references the JavaScript variable for the HTML audio element who has it's src set to the audio file returned from Polly that corresponds to the text's speech marks:

    function getSpeechMarkAtTime(speechMarks, time) {
      const length = speechMarks.length
      let match = speechMarks[0]
      let found = false
      let i = 1
    
      while (i < length && !found) {
        if (speechMarks[i].time <= time) {
          match = speechMarks[i]
        } else {
          found = true
        }
    
        i++
      }
    
      return match
    }
    
    function onTimeUpdate(speechMark) {
      /**
       * Update your HTML and CSS based on the attributes
       * of the speech mark at the audio's current time.
       */
    }
    
    audio.addEventListener('timeupdate', () => {
      // Polly Speech Marks use milliseconds
      const currentTime = audio.currentTime * 1000
      const speechMark = getSpeechMarkAtTime(speechMarksJSONResult, currentTime)
    
      // Some custom callback
      onTimeUpdate(speechMark)
    })