How can I generate timed-text (e.g. for subtitles) synchronised with Text-to-Speech (TTS) word-by-word?
I'd like to do this using the high quality SAPI5 voices (e.g. those available from IVONA here) and that I have used on Windows 10.
On Windows we already have some good free TTS programs:
TTSApp can produce audio files in WAV. Balabolka creates MP3 files
along with synchronised timed-text as LRC files used in Karaoke - BUT only on line-by-line basis NOT word-by-word.
However, both show word-by-word highlighting while they speak aloud on screen - in real time.
If I had some TTS/SAPI5 source code I could simply check the clock every time a new word starts to be generated and write the time and that word to a file. Does anyone know of any project that exposes that level of programming - so I might start from there?
UPDATE SEPT 2016
I've since discovered the TTSApp was reimplemented using AutoHotKey by a certain jballi in 2012.
I've adapted that code to append to a text file the time in ms every time the onWord event handler fires. Still I need to make two passes:
I am still hoping to find a way to accelerate step 2.
BTW The VisualBasic source appears to be archived here.
It is possible to do all of this offline!
You generate a WAV file using SAPI while specifying DoEvents
- documented here.
A binary representation of each event (e.g. phoneme/word/sentence) gets appended to the end of the WAV file. A certain Hans documented the WAV/SAPI format in 2009 here.
This can all be done by a simple modification of jballi's 2012 AutoHotkey version of TTSApp
Basically you replace these lines of code in Example1GUI.ahk
SpFileStream.Open(SaveToFileName,SSFMCreateForWrite,False)
;-- Set the output stream to the file stream
SpVoice.AllowAudioOutputFormatChangesOnNextSet:=False
SpVoice.AudioOutputStream:=SpFileStream
;-- Speak using the given flags
SpVoice.Speak(Text,SpeakFlags)
with the following:
SpFileStream.Open(SaveToFileName,SSFMCreateForWrite,True) ;-- DoEvents
;-- Set the output stream to the file stream
SpVoice.AllowAudioOutputFormatChangesOnNextSet:=False
SpVoice.AudioOutputStream:=SpFileStream
if not Sink ;-- DoEvents label
{
ComObjConnect(SpVoice, "On")
Sink:=True
}
;-- Speak using the given flags
SpVoice.Speak(Text,SpeakFlags|SVSFlagsAsync|SVSFPurgeBeforeSpeak)