pythonlinuxsubprocessalsaespeak

Convert python espeak + subprocess code to play output audio directly


I'm using an existing program that reads xml from a socket, converts text to a wav file and then plays it over the audio output device.

I'd like to strip it down so it just plays the text direct to audio.

Right now I'm having a difficult time figuring out if I've got the correct code and understanding if it's actually creating the wav file.

Function that calls calls the text to speech function

def generate_audio(self, language, voice=None):
    info = self.get_first_info(language, bestmatch=False)
    if info is None:
        self.media_info[language] = None
        return False

    truncate = not self.broadcast_immediately() and bcastplayer.Config.setting('alerts_truncate')
    message_text = info.get_message_text(truncate)

    location = bcastplayer.ObData.get_datadir() + "/alerts"
    if os.access(location, os.F_OK) == False:
        os.mkdir(location)
    filename = self.reference(self.sent, self.identifier) + "-" + language + ".wav"

    resources = info.get_resources('audio')
    if resources:
        if resources[0].write_file(os.path.join(location, filename)) is False:
            return False

    elif message_text:
        self.write_tts_file(os.path.join(location, filename), message_text, voice)

    else:
        return False

Can this be modified to play the audio directly?

def write_tts_file(self, path, message_text, voice=None):
    if not voice:
        voice = 'en'
    proc = subprocess.Popen([ 'espeak', '-m', '-v', voice, '-s', '130', '--stdout' ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True)
    (stdout, stderr) = proc.communicate(message_text.encode('utf-8') + b" <break time=\"2s\" /> " + message_text.encode('utf-8') + b" <break time=\"3s\" /> ")
    proc.wait()

    with open(path, 'wb') as f:
        f.write(stdout)

I've never seen code like this using process, subprocess, stdout, PIPE.

Is it easy to change the subprocess code to something that just pipes or redirects the output to aplay without creating the wav file?

There was another answer that might give a clue - but again, my newbie understanding isn't sure how to convert this code to that answer

How to use python Popen with a espeak and aplay


Solution

  • You can link the two processes together using subprocess.PIPE. Here is a modified version of the write_tts_file function:

    def write_tts_file(self, path, message_text, voice=None):
        if not voice:
            voice = 'en'
        proc = subprocess.Popen(['espeak', '-m', '-v', voice, '-s', '130', '--stdout' ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True)
        aplay = subprocess.Popen(['aplay', '-D', 'sysdefault'], stdin=proc.stdout)
        proc.stdin.write(message_text.encode('utf-8') + b" <break time=\"2s\" /> " + message_text.encode('utf-8') + b" <break time=\"3s\" /> \n")
        proc.stdin.close()
        proc.wait()
    

    It is important to close the proc's stdin after you have sent the message that is to be spoken. This will make proc quit when it has sent its data, and close its output to aplay, which in turn will quit when it has finished playing. If proc's input isn't closed, neither of them will quit.