text-to-speechpyttsx3gtts

How do I print the current word being utterred in pyttsx3


The official documentation for pyttsx3 gives a variation of the following example for printing words that are currently being said. The only difference is that the print statements are in Python 3.x syntax instead of Python 2.x.

import pyttsx3
def onStart(name):
    print('starting', name)
def onWord(name, location, length):
    print('word', name, location, length)
def onEnd(name, completed):
    print('finishing', name, completed)
engine = pyttsx3.init()
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
engine.say('The quick brown fox jumped over the lazy dog.')
engine.runAndWait()

The following incorrect output is printed.

starting None
word None 1 0
finishing None True

How can I print the actual word being uttered?

EDIT: If this task is not possible in pyttsx3, I am also open to using any other text to speech library to accomplish this.


Solution

  • The attribute name is intended to be a tag that is added to an utterance. You have to set it yourself as the second, optional argument to say, for example say("hello world", "introduction"). In this case the value of name in all of the callbacks will be introduction. From the documentation:

    say(text : unicode, name : string) → None

    Queues a command to speak an utterance. The speech is output according to the properties set before this command in the queue. Parameters:

    text – Text to speak.
    name – Name to associate with the utterance. Included in notifications about this utterance.

    You can use this by duplicating the actual text in the engine.say() call, i.e., engine.say(sentence, sentence). Then you can use the location and length arguments, which are string indexes, to extract the actual word from the sentence and print it in the callback.

    MCVE:

    import pyttsx3
    def onStart(name):
        print('starting', name)
    def onWord(name, location, length):
        print('word', name[location:location+length], location, length)
    def onEnd(name, completed):
        print('finishing', name, completed)
    engine = pyttsx3.init()
    engine.connect('started-utterance', onStart)
    engine.connect('started-word', onWord)
    engine.connect('finished-utterance', onEnd)
    sentence = 'The quick brown fox jumped over the lazy dog.'
    engine.say(sentence, sentence)
    engine.runAndWait()
    

    Output:

    starting The quick brown fox jumped over the lazy dog.
    word The 0 3
    word quick 4 5
    word brown 10 5
    word fox 16 3
    word jumped 20 6
    word over 27 4
    word the 32 3
    word lazy 36 4
    word dog 41 3
    finishing The quick brown fox jumped over the lazy dog. True
    

    Note that each engine implements the callbacks separately. The above was tested with the espeak engine on Linux, it might be that other engines for Windows and Mac implement it differently regarding the exposed information.