pythonartificial-intelligencespeech-recognitionspeech-to-textvideo-subtitles

How to add timestamps to transcript file in accordance to audio file? (result is srt file)


I use speech to text api to make srt files for audio\video (subtitles with timestamps) using python script. But its not 100% accurate. I have transcript for audio file which is accurate (it has some unnecessary lines). How to add timestamps to transcript in accordance to timestamps from audio so result will be srt file with lines from transcript and timestamps from audio?

I use api to make srt file and timestamps are very good accuracy, but text sometimes not. So input is transcript (just text) with accurate lines and some unnecessary lines Output is srt file where transcript lines in accordance to timestamps from audio (api making timestamps).

So basically i need some python code which adding lines from input transcript to generated using API timestamps. May be thats possible by comparing lines from transcript and lines from transcribed audio and then replacing it if it has high accuracy. Thanks.


Solution

  • The specific task you are looking to do here is called forced alignment. This link is a good collection of tools you can explore to do forced alignment. It includes some Python tools:

    https://github.com/pettarin/forced-alignment-tools