pythonamazon-web-servicesdelaylagamazon-polly

Python scripts work great when testing but not combined. Speech Controlled Assistant using Amazon Polly


I have a voice controlled bot I am creating to control my PC. I recently switched to Amazon Polly from Windows built in STT and started to notice that some of the functions would have a delay between the function being triggered and returning. I did notice that all the functions work great without any delay if I bypass the speech to text function, microphone and google recognizer. But, for example, when I used the speech to text for a time function there was over 4 minutes between starting the time function and returning the the time.

Beware I'm quite an amateur and self taught programmer so my code is a bit of a mess. This is my main section that handles the Speech to Text. I suspect it might be interacting with things and creating problems but honestly have no idea.

I'm not even sure how to figure out if it is the problem. As there's not really a "problem", just things taking really long to complete.

import os 
dir_path = os.path.dirname(os.path.realpath(__file__))

import speech_recognition as sr
import run_commands as commands
import speech_engine as engine
import time
import signal
from command_triggers import *
from playsound import playsound
import threading
import random

r = sr.Recognizer()
m = sr.Microphone(0)

# create all combinations of aiNames and triggerWords
for name in aiNames:
    aiTriggers.append(name)
    for word in aiGreetings:
        aiTriggers.append(word + " " + name)

# print(aiTriggers)

# checks if string starts with series of strings from list. 
# returns matching string, else blank
def startsWithList(text, list):
    for each in list:
        if text.startswith(each):
            # print(each)
            return each
    return ""

def callback(recognizer, audio):
    try:
        recognizedText = recognizer.recognize_google(audio).lower()
        print("Google Speech Recognition thinks you said " + recognizedText)
    
        # check if spoken text uses ai keywords. If yes remove trigger from text then run command.
        output = recognizedText.replace(startsWithList(recognizedText, aiTriggers), "").strip()
        # print(output)
        if output != "" and startsWithList(recognizedText, aiTriggers) != "":
            print("Google Speech Recognition thinks the command was " + output)
            # engine.speakText(command)
            commands.run(output)
        elif output == "":
            engine.speakText(random.choice(aiGreetings))
        

            
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))


r = sr.Recognizer()
m = sr.Microphone()
with m as source:
    r.adjust_for_ambient_noise(source, duration=2)  
    r.dynamic_energy_threshold = False
    r.energy_threshold = 400

stop_listening = r.listen_in_background(m, callback)
# print("Hi, I am ava. How can I help.")
# playsound(dir_path + '\startup.mp3')
engine.speakText("Hi, I am ava. How can I help.")

# commands.run("what is the weather")
while True:
    pass

This is the time command. I don't know of anything that should be causing it to have such a long delay but maybe there is something? It runs fine on it's own.

import os, sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from datetime import datetime, timezone
import pytz
from geopy.geocoders import Nominatim
from timezonefinder import TimezoneFinder
import random
import speech_engine as engine

timeSpeakingOptions=["Its currently", "Its", "the time is", "the local time is"]

def getTime(location):
    print("time running")
    textToSpeak = ""

    if location == "":
        geolocator = Nominatim(user_agent="Your_Name")
        place = geolocator.geocode("battle ground, washington")

        timezone = TimezoneFinder().timezone_at(lng=place.longitude, lat=place.latitude)
        currentTime = datetime.now(pytz.timezone(timezone)).strftime("%I:%M %p")

        textToSpeak = "The current time is " + currentTime
        # print(textToSpeak)
    elif location != "":
        geolocator = Nominatim(user_agent="Your_Name")
        place = geolocator.geocode(location)

        timezone = TimezoneFinder().timezone_at(lng=place.longitude, lat=place.latitude)
        currentTime = datetime.now(pytz.timezone(timezone)).strftime("%I:%M %p")

        textToSpeak = ("the time in " +  location + " is " + currentTime)
        # print(textToSpeak)

    engine.speakText(textToSpeak)

# getTime("new york")
# getTime("")

I'm not necessarily trying to get the problem fixed, although it would be nice. I'd like to learn how to try and figure out this problem. As of right now I don't have enough knowledge to know if it's the code or not, if it's something to do with the services I'm using, or something else. Thanks!


Solution

  • I have not tested it, but while True: pass would have consumed a significant amount of CPU. This may explain why everything else is so slow. Instead, ensure to include a pause in such a loop to give the CPU a breather. Example:

    while True:
        time.sleep(1)
    

    Also if the Recognizer is listening in the background, I think you need to call certain functions to stop it to avoid having an increasing number of background processes.