peppergoogle-speech-to-text-api

Inquiry on possibility of providing ALSpeechRecognition access to Google speech-to-text API or other online API with huge vocabulary list


Problem description I want to provide a pepper robot the ability to recognize speech and convert it to text using a huge vocabulary list. Is there a way to provide the setVocabulary method with some external vocabulary list from some other API such as the google-speech-to-text API so I don't have to manually set what words and phrases to recognize? Basically much like how the recognize_google method in the python library speech_recognition operates. Or Perhaps I'm going about this the wrong way? Any alternatives are welcome.

My current set up I am currently using the ALSpeechRecognition API (Aldebaran software), with it's setVocabulary method to set a list of phrases to recognize. I subscribe to the WordRecognized event which stores any recognized words or phrase. I retrieve recognized words and phrases using the getData method from the ALMemory API.

This is the python script I currently use to access text from speech:

#!/usr/bin/env python2.7
from naoqi import ALProxy
import time
ROBOT_IP = "130.191.48.26"

# Creates a proxy on the speech-recognition module
asr = ALProxy("ALSpeechRecognition", ROBOT_IP, 9559)
mem = ALProxy("ALMemory", ROBOT_IP, 9559)
asr.setLanguage("English")

# Example: Adds "yes", "no" and "please" to the vocabulary (without wordspotting)
vocabulary = ["yes", "no", "please listen to me","okay"]
asr.setVocabulary(vocabulary, False)

# Start the speech recognition engine with user Test_ASR
asr.subscribe("WordRecognized")
print('Speech recognition engine started')
time.sleep(5)
word = mem.getData("WordRecognized")
print(word)
asr.unsubscribe("WordRecognized")

Solution

  • The ALSpeechRecognition interface is useful just for a few short words or phrases to choose from. For long phrase lists, it is better to use the ALDialog Pepper interface. Or you can call the Google Speech API externally (see e.g. the function recognize_Google() in the Pepper-Controller), but this adds an unwanted delay to the robot speech processing.