reactjsannotationstext-to-speech

how to highlight text as per audio on a website in realtime as the audio narrates it


I am trying to figure out which technology to use to hightlight a text as per the audio. Much like what https://speechify.com/ is doing. enter image description here

This is assuming I am able to run a TTS algo and I am able to convert text to speech. I have tried multiple sources but I am unable to pinpoint the exact technology or methodology of highlighting the text as the audio speaks.

Any help would be much appreciated. I have already wasted 2 days on the internet to figure this out but no luck :(


Solution

  • A simple approach would be to use the event listener provided by the SpeechSynthesisUtterance boundary event to highlight words with vanilla JS. The emitted event gives us char indices, so no need to go crazy with regexes or super AI stuff :)

    Before anything else, make sure the API is available

    const synth = window.speechSynthesis
    if (!synth) {
      console.error('no tts for you!')
      return
    }
    

    The tts utterance emits an 'boundary' event, we can use it to highlight text.

    let text = document.getElementById('text')
    let originalText = text.innerText
    let utterance = new SpeechSynthesisUtterance(originalText)
    utterance.addEventListener('boundary', event => {
      const { charIndex, charLength } = event
      text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
    })
    synth.speak(utterance)
    

    Full example:

    const btn = document.getElementById("btn")
    
    const highlight = (text, from, to) => {
      let replacement = highlightBackground(text.slice(from, to))
      return text.substring(0, from) + replacement + text.substring(to)
    }
    const highlightBackground = sample => `<span style="background-color:yellow;">${sample}</span>`
    
    btn && btn.addEventListener('click', () => {
      const synth = window.speechSynthesis
      if (!synth) {
        console.error('no tts')
        return
      }
      let text = document.getElementById('text')
      let originalText = text.innerText
      let utterance = new SpeechSynthesisUtterance(originalText)
      utterance.addEventListener('boundary', event => {
        const { charIndex, charLength } = event
        text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
       })
      synth.speak(utterance)
    })
    

    CodeSandbox link

    This is pretty basic, and you can (and should) improve it.

    Edit

    Ooops, I forgot that this was tagged as ReactJs. Here's the same example with React (codesandbox link is in the comments):

    import React from "react";
    
    const ORIGINAL_TEXT =
      "Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.";
    
    const splitText = (text, from, to) => [
      text.slice(0, from),
      text.slice(from, to),
      text.slice(to)
    ];
    
    const HighlightedText = ({ text, from, to }) => {
      const [start, highlight, finish] = splitText(text, from, to);
      return (
        <p>
          {start}
          <span style={{ backgroundColor: "yellow" }}>{highlight}</span>
          {finish}
        </p>
      );
    };
    
    export default function App() {
      const [highlightSection, setHighlightSection] = React.useState({
        from: 0,
        to: 0
      });
      const handleClick = () => {
        const synth = window.speechSynthesis;
        if (!synth) {
          console.error("no tts");
          return;
        }
    
        let utterance = new SpeechSynthesisUtterance(ORIGINAL_TEXT);
        utterance.addEventListener("boundary", (event) => {
          const { charIndex, charLength } = event;
          setHighlightSection({ from: charIndex, to: charIndex + charLength });
        });
        synth.speak(utterance);
      };
    
      return (
        <div className="App">
          <HighlightedText text={ORIGINAL_TEXT} {...highlightSection} />
          <button onClick={handleClick}>klik me</button>
        </div>
      );
    }