pythonspeech-recognitionspeechnaturallyspeaking

Dragon NaturallySpeaking Programmers


Is there anyway to encorporate Dragon NaturallySpeaking into an event driven program? My boss would really like it if I used DNS to record user voice input without writing it to the screen and saving it directly to XML. I've been doing research for several days now and I can not see a way for this to happen without the (really expensive) SDK, I don't even know that it would work then.

Microsoft has the ability to write a (Python) program where it's speech recognizer can wait until it detects a speech event and then process it. It also has the handy quality of being able to suggest alternative phrases to the one that it thinks is the best guess and recording the .wav file for later use. Sample code:

spEngine = MsSpeech()
spEngine.setEventHandler(RecoEventHandler(spEngine.context))

class RecoEventHandler(SpRecoContext):
def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
    res = win32com.client.Dispatch(Result)
    phrase = res.PhraseInfo.GetText()
    #from here I would save it as XML

    # write reco phrases
    altPhrases = reco.Alternates(NBEST)
    for phrase in altPhrases:
        nodePhrase = self.doc.createElement(TAG_PHRASE)

I can not seem to make DNS do this. The closest I can do-hickey it to is:

while keepGoing == True:
    yourWords = raw_input("Your input: ")
    transcript_el = createTranscript(doc, "user", yourWords)
    speech_el.appendChild(transcript_el)
    if yourWords == 'bye':
        break

It even has the horrible side effect of making the user say "new-line" after every sentence! Not the preferred solution at all! Is there anyway to make DNS do what Microsoft Speech does?

FYI: I know the logical solution would be to simply switch to Microsoft Speech but let's assume, just for grins and giggles, that that is not an option.

UPDATE - Has anyone bought the SDK? Did you find it useful?


Solution

  • Solution: download Natlink - http://qh.antenna.nl/unimacro/installation/installation.html It's not quite as flexible to use as SAPI but it covers the basics and I got almost everything that I needed out of it. Also, heads up, it and Python need to be downloaded for all users on your machine or it won't work properly and it works for every version of Python BUT 2.4.

    Documentation for all supported commands is found under C:\NatLink\NatLink\MiscScripts\natlink.txt after you download it. It's under all the updates at the top of the file.

    Example code:

    #make sure DNS is running before you start
    if not natlink.isNatSpeakRunning():
      raiseError('must start up Dragon NaturallySpeaking first!')
      shutdownServer()
      return
    #connect to natlink and load the grammer it's supposed to recognize
    natlink.natConnect()
    loggerGrammar = LoggerGrammar()
    loggerGrammar.initialize()
    if natlink.getMicState() == 'off':
       natlink.setMicState('on')
    userName = 'Danni'
    natlink.openUser(userName)
    #natlink.waitForSpeech() continuous loop waiting for input. 
    #Results are sent to gotResultsObject method of the logger grammar
    natlink.waitForSpeech()
    natlink.natDisconnect()
    

    The code's severely abbreviated from my production version but I hope you get the idea. Only problem now is that I still have to returned to the mini-window natlink.waitForSpeech() creates to click 'close' before I can exit the program safely. A way to signal the window to close from python without using the timeout parameter would be fantastic.