I am building an application for the Amazon Echo in python. When I speak a bad utterance that the Amazon Echo does not recognize, my skill quits and returns me to the home screen. I am looking to prevent this and repeat what was just uttered by the Amazon Echo.
To try to achieve this to some extent I try calling a function to say something when the session ends or bad input is detected.
def on_session_ended(session_ended_request, session):
"""
Called when the user ends the session.
Is not called when the skill returns should_end_session=true
"""
print("on_session_ended requestId=" + session_ended_request['requestId'] +
", sessionId=" + session['sessionId'])
return get_session_end_response()
However, I just get an error from the Echo -- this function, on_session_ended
is never entered.
So how do I conduct error catching and handling on the Amazon Echo?
UPDATE 1: I reduced the number of utterances and the number of intents with custom slots to one. Now a user should only speak A, B, C, or D. If they speak anything outside of this, then the intent is still triggered but with no slot value. Thus, I can do some error checking based on whether the slot value is there or not. However, this seems like not the best way to do it. When I try to add in intents with no slots and a corresponding utterance, anything that doesn't match either of my intents defaults to this new intent. How can I resolve these issues?
UPDATE 2: Here are some relevant sections of my code.
Intent handlers:
def lambda_handler(event, context):
print("Python START -------------------------------")
print("event.session.application.applicationId=" +
event['session']['application']['applicationId'])
if event['session']['new']:
on_session_started({'requestId': event['request']['requestId']},
event['session'])
if event['request']['type'] == "LaunchRequest":
return on_launch(event['request'], event['session'])
elif event['request']['type'] == "IntentRequest":
return on_intent(event['request'], event['session'])
elif event['request']['type'] == "SessionEndedRequest":
return on_session_ended(event['request'], event['session'])
def on_session_started(session_started_request, session):
print("on_session_started requestId=" + session_started_request['requestId']
+ ", sessionId=" + session['sessionId'])
def on_launch(launch_request, session):
""" Called when the user launches the skill without specifying what they want """
print("on_launch requestId=" + launch_request['requestId'] +
", sessionId=" + session['sessionId'])
# Dispatch to your skill's launch
return create_new_user()
def on_intent(intent_request, session):
""" Called when the user specifies an intent for this skill """
print("on_intent requestId=" + intent_request['requestId'] +
", sessionId=" + session['sessionId'])
intent = intent_request['intent']
intent_name = intent['name']
attributes = session["attributes"] if 'attributes' in session else None
intent_slots = intent['slots'] if 'slots' in intent else None
# Dispatch to skill's intent handlers
# TODO : Authenticate users
# TODO : Start session in a different spot depending on where user left off
if intent_name == "StartQuizIntent":
return create_new_user()
elif intent_name == "AnswerIntent":
return get_answer_response(intent_slots, attributes)
elif intent_name == "TestAudioIntent":
return get_audio_response()
elif intent_name == "AMAZON.HelpIntent":
return get_help_response()
elif intent_name == "AMAZON.CancelIntent":
return get_session_end_response()
elif intent_name == "AMAZON.StopIntent":
return get_session_end_response()
else:
return get_session_end_response()
def on_session_ended(session_ended_request, session):
"""
Called when the user ends the session.
Is not called when the skill returns should_end_session=true
"""
print("on_session_ended requestId=" + session_ended_request['requestId'] +
", sessionId=" + session['sessionId'])
return get_session_end_response()
Then we have the functions that actually get called and the response builders. I have edited some of the code for privacy. I haven't built up all the display response text fields and have some uids hard coded so I don't have to worry about authentication yet.
# --------------- Functions that control the skill's behavior ------------------
####### GLOBAL SETTINGS ########
utility_background_image = "https://i.imgur.com/XXXX.png"
def get_welcome_response():
""" Returns the welcome message if a user invokes the skill without specifying an intent """
session_attributes = {}
card_title = ""
speech_output = ("Hello and welcome ... quiz .... blah blah ...")
reprompt_text = "Ask me to start and we will begin the test!"
should_end_session = False
# visual responses
primary_text = '' # TODO
secondary_text = '' # TODO
return build_response(session_attributes,
build_speechlet_response(card_title, speech_output, reprompt_text,
should_end_session,
build_display_response(utility_background_image,
card_title, primary_text,
secondary_text)))
def get_session_end_response():
""" Returns the ending message if a user errs or exits the skill """
session_attributes = {}
card_title = ""
speech_output = "Thank you for your time!"
reprompt_text = ''
should_end_session = True
# visual responses
primary_text = '' # TODO
secondary_text = '' # TODO
return build_response(session_attributes,
build_speechlet_response(card_title, speech_output, reprompt_text,
should_end_session,
build_display_response(utility_background_image,
card_title, primary_text,
secondary_text)))
def get_audio_response():
""" Tests the audio capabilities of the echo """
session_attributes = {}
card_title = "" # TODO : keep no 'welcome'?
speech_output = ""
reprompt_text = ""
should_end_session = False
# visual responses
primary_text = '' # TODO
secondary_text = '' # TODO
return build_response(session_attributes,
build_speechlet_response(card_title, speech_output, reprompt_text,
should_end_session, build_audio_response()))
def create_new_user():
""" Creates a new user that the server will recognize and whose action will be stored in db """
url = "http://XXXXXX:XXXX/create_user"
response = urllib.request.urlopen(url)
data = json.loads(response.read().decode('utf8'))
uuid = data["uuid"]
return ask_question(uuid)
def query_server(uuid):
""" Requests to get num_questions number of questions from the server """
url = "http://XXXXXXXX:XXXX/get_question_json?uuid=%s" % (uuid) # TODO : change needs to to be uuid
response = urllib.request.urlopen(url)
data = json.loads(response.read().decode('utf8'))
if data["status"]:
question = data["data"]["question"]
quid = data["data"]["quid"]
next_quid = data["data"]["next_quid"] # TODO : will we need any of this?
topic = data["data"]["topic"]
type = data["data"]["type"]
media_type = data["data"]["media_type"] # either 'IMAGE', 'AUDIO', or 'VIDEO'
answers = data["data"]["answer"] # list of answers stored in order they should be spoken
images = data["data"]["image"] # list of images that correspond to order of answers list
audio = data["data"]["audio"]
video = data["data"]["video"]
question_data = {"status": True, "data":{"question": question, "quid": quid, "answers": answers,
"media_type": media_type, "images": images, "audio": audio, "video": video}}
if next_quid is "None":
return None
return question_data
else:
return {"status": False}
def ask_question(uuid):
""" Returns a quiz question to the user since they specified a QuizIntent """
question_data = query_server(uuid)
if question_data is None:
return get_session_end_response()
card_title = "Ask Question"
speech_output = ""
session_attributes = {}
should_end_session = False
reprompt_text = ""
# visual responses
display_title = ""
primary_text = ""
secondary_text = ""
images = []
answers = []
if question_data["status"]:
session_attributes = {
"quid": question_data["data"]["quid"],
"uuid": "df876c9d-cd41-4b9f-a3b9-3ccd1b441f24",
"question_start_time": time.time()
}
question = question_data["data"]["question"]
answers = question_data["data"]["answers"] # answers are shuffled when pulled from server
images = question_data["data"]["images"]
# TODO : consider different media types
speech_output += question
reprompt_text += ("Please choose an answer using the official NATO alphabet. For example," +
" A is alpha, B is bravo, and C is charlie.")
else:
speech_output += "Oops! This is embarrassing. There seems to be a problem with the server."
reprompt_text += "I don't exactly know where to go from here. I suggest restarting this skill."
return build_response(session_attributes, build_speechlet_response(card_title, speech_output,
reprompt_text, should_end_session,
build_display_response_list_template2(title=question, image_urls=images, answers=answers)))
def send_quiz_responses_to_server(uuid, quid, time_used_for_question, answer_given):
""" Sends the users responses back to the server to be stored in the database """
url = ("http://XXXXXXXX:XXXX/send_answers?uuid=%s&quid=%s&time=%s&answer_given=%s" %
(uuid, quid, time_used_for_question, answer_given))
response = urllib.request.urlopen(url)
data = json.loads(response.read().decode('utf8'))
return data["status"]
def get_answer_response(slots, attributes):
""" Returns a correct/incorrect message to the user depending on their AnswerIntent """
# get time, quid, and uuid from attributes
question_start_time = attributes["question_start_time"]
quid = attributes["quid"]
uuid = attributes["uuid"]
# get answer from slots
try:
answer_given = slots["Answer"]["value"].lower()
except KeyError:
return get_session_end_response()
# calculate a rough estimate of the time it took to answer question
time_used_for_question = str(int(time.time() - question_start_time))
# record response data by sending it to the server
send_quiz_responses_to_server(uuid, quid, time_used_for_question, answer_given)
return ask_question(uuid)
def get_help_response():
""" Returns a help message to the user since they called AMAZON.HelpIntent """
session_attributes = {}
card_title = ""
speech_output = "" # TODO
reprompt_text = "" # TODO
should_end_session = False
return build_response(session_attributes,
build_speechlet_response(card_title, speech_output, reprompt_text, should_end_session,
build_display_response(utility_background_image, card_title)))
# --------------- Helpers that build all of the responses ----------------------
def build_hint_response(hint):
"""
Builds the hint response for a display.
For example, Try "Alexa, play number 1" where "play number 1" is the hint.
"""
return {
"type": "Hint",
"hint": {
"type": "RichText",
"text": hint
}
}
def build_display_response(url='', title='', primary_text='', secondary_text='', tertiary_text=''):
"""
Builds the display template for the echo show to display.
Echo show screen is 1024px x 600px
For additional image size requirements, see the display interface reference.
"""
return [{
"type": "Display.RenderTemplate",
"template": {
"type": "BodyTemplate1",
"token": "question",
"title": title,
"backgroundImage": {
"contentDescription": "Question",
"sources": [
{
"url": url
}
]
},
"textContent": {
"primaryText": {
"type": "RichText",
"text": primary_text
},
"secondaryText": {
"type": "RichText",
"text": secondary_text
},
"tertiaryText": {
"type": "RichText",
"text": tertiary_text
}
}
}
}]
def build_list_item(url='', primary_text='', secondary_text='', tertiary_text=''):
return {
"token": "question_item",
"image": {
"sources": [
{
"url": url
}
],
"contentDescription": "Question Image"
},
"textContent": {
"primaryText": {
"type": "RichText",
"text": primary_text
},
"secondaryText": {
"text": secondary_text,
"type": "PlainText"
},
"tertiaryText": {
"text": tertiary_text,
"type": "PlainText"
}
}
}
def build_display_response_list_template2(title='', image_urls=[], answers=[]):
list_items = []
for image, answer in zip(image_urls, answers):
list_items.append(build_list_item(url=image, primary_text=answer))
return [{
"type": "Display.RenderTemplate",
"template": {
"type": "ListTemplate2",
"token": "question",
"title": title,
"backgroundImage": {
"contentDescription": "Question Background",
"sources": [
{
"url": "https://i.imgur.com/HkaPLrK.png"
}
]
},
"listItems": list_items
}
}]
def build_audio_response(url): # TODO add a display repsonse here as well
""" Builds audio response. I.e. plays back an audio file with zero offset """
return [{
"type": "AudioPlayer.Play",
"playBehavior": "REPLACE_ALL",
"audioItem": {
"stream": {
"token": "audio_clip",
"url": url,
"offsetInMilliseconds": 0
}
}
}]
def build_speechlet_response(title, output, reprompt_text, should_end_session, directive=None):
""" Builds speechlet response and puts display response inside """
return {
'outputSpeech': {
'type': 'PlainText',
'text': output
},
'card': {
'type': 'Simple',
'title': title,
'content': output
},
'reprompt': {
'outputSpeech': {
'type': 'PlainText',
'text': reprompt_text
}
},
'shouldEndSession': should_end_session,
'directives': directive
}
def build_response(session_attributes, speechlet_response):
""" Builds the complete response to send back to Alexa """
return {
'version': '1.0',
'sessionAttributes': session_attributes,
'response': speechlet_response
}
UPDATE 3: I updated the intents so there is now one custom intent that takes a custom slot, and then I have another custom intent that takes no slots. These custom intents also have there own sample utterances. Both the intents and their utterances are listed below. When I start the skill, it works fine. Then when I say/type "zoo zoo zoo" to test bad input, I get an error. Both the request for "zoo zoo zoo" and the response are listed below. I am looking for a good way to catch this bad input error and resume/revert the skill back to its previous state.
Intents:
...
{
"intent": "TestAudioIntent"
},
{
"slots": [
{
"name": "Answer",
"type": "LETTER"
}
],
"intent": "AnswerIntent"
},
...
Sample Utterances:
AnswerIntent {Answer}
AnswerIntent I think it is {Answer}
TestAudioIntent test the audio
Example JSON request:
{
"session": {
"new": false,
"sessionId": "SessionId.574f0b74-be17-4f79-bbd6-ce926a1bf856",
"application": {
"applicationId": "XXXXXXXX"
},
"attributes": {
"quid": "7fa9fcbf-35db-4bbd-ac73-37977bcef563",
"question_start_time": 1515691612.7381804,
"uuid": "df876c9d-cd41-4b9f-a3b9-3ccd1b441f24"
},
"user": {
"userId": "XXXXXXXX"
}
},
"request": {
"type": "IntentRequest",
"requestId": "EdwRequestId.23765cb0-f327-4f52-a9a3-b9f92a375a5f",
"intent": {
"name": "TestAudioIntent",
"slots": {}
},
"locale": "en-US",
"timestamp": "2018-01-11T17:26:57Z"
},
"context": {
"AudioPlayer": {
"playerActivity": "IDLE"
},
"System": {
"application": {
"applicationId": "XXXXXXXX"
},
"user": {
"userId": "XXXXXXXX"
},
"device": {
"supportedInterfaces": {
"Display": {
"templateVersion": "1",
"markupVersion": "1"
}
}
}
}
},
"version": "1.0"
}
And I get the following testing error as a response:
The remote endpoint could not be called, or the response it returned was invalid.
What I ended up doing is using something similar to Amazon's dialogue management system. If a user says something that doesn't fill a slot, I re-prompt them with that question. My goal is to record a user's statements/answers after each time they speak, thus I didn't use the built-in dialogue management. Additionally, I used Amazon's slot synonyms for all my slots to make my modal more robust.
I still don't know that this is the best way, but it is a starting point and seems to work O.K....