speech-recognitionsapimicrosoft-speech-platform

Does the MS Speech Platform 11 Recognizer support ARPA compiled grammars?


How can I use ARPA files with MS Speech? The documentation for the Microsoft Speech Platform 11 Recognizer implies that one can compile a grammar from an ARPA file.

I am able to compile an ARPA file -- for instance, the tiny example provided by Microsoft -- using the following command line:

CompileGrammar.exe -In stock.arpa -InFormat ARPA

I'm able to use the resulting CFG file in the following test:

using Microsoft.Speech.Recognition;

// ...

using (var engine = new SpeechRecognitionEngine(new CultureInfo("en-US")))
{
    engine.LoadGrammar(new Grammar("stock.cfg"));
    var result = engine.EmulateRecognize("will stock go up");
    Assert.That(result, Is.Not.Null);
}

This test passes, but note that it uses EmulateRecognize(). When I switch to using an actual audio file, like this:

using (var engine = new SpeechRecognitionEngine(new CultureInfo("en-US"))) 
{
    engine.LoadGrammar(new Grammar("stock.cfg"));
    engine.SetInputToWaveFile("go-up.wav");
    var result = engine.Recognize();
}

result is always null and the test fails.

Microsoft states quite clearly that it's supported, yet even very simple examples don't seem to work. What am I doing wrong?


Solution

  • There are two different answers to this question depending on which version of the Microsoft Speech SDK you're using. (See: What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? )

    System.Speech (Desktop Version)

    In this case, see seiya1223's answer. The sample code there works great.

    Microsoft.Speech (Server Version)

    Perhaps because the server version does not include the "dictation engine," the Microsoft.Speech library will apparently never match an ARPA-sourced CFG. However, it will still hypothesize what was said via the SpeechRecognitionRejected event. Here are the necessary changes from seiya1223's desktop code:

    1. Change your using statement from System.Speech to Microsoft.Speech, of course.
    2. Add an event handler for the SpeechRecognitionRejected event.
    3. In your event handler, examine the e.Result.Text property for the final hypothesis.

    The following snippet should help illustrate:

    static string transcription;
    
    static void Main(string[] args)  
    {
      using (var recognizer = new SpeechRecognitionEngine(new CultureInfo("en-us")))
      {
        engine.SpeechRecognitionRejected += SpeechRecognitionRejectedHandler;
        // ...
      }
    }
    
    void SpeechRecognitionRejectedHandler(object sender, SpeechRecognitionRejectedEventArgs e)
    {
      if (e.Result != null && !string.IsNullOrEmpty(e.Result.Text))
        transcription = e.Result.Text;
    }
    

    This handler is called once at the end of recognition. For example, here is the output from seiya1223's code, but using all of the available event handlers and a bunch of extra logging (emphasis mine):

    Starting asynchronous recognition...
    In SpeechDetectedHandler:
    - AudioPosition = 00:00:01.2300000
    In SpeechHypothesizedHandler:
    - Grammar Name = Stock; Result Text = Go
    In SpeechHypothesizedHandler:
    - Grammar Name = Stock; Result Text = will
    In SpeechHypothesizedHandler:
    - Grammar Name = Stock; Result Text = will Stock
    In SpeechHypothesizedHandler:
    - Grammar Name = Stock; Result Text = will Stock Go
    In SpeechHypothesizedHandler:
    - Grammar Name = Stock; Result Text = will Stock Go Up
    In SpeechRecognitionRejectedHandler:
    - Grammar Name = Stock; Result Text = will Stock Go Up

    In RecognizeCompletedHandler.
    - AudioPosition = 00:00:03.2000000; InputStreamEnded = True
    - No result.
    Done.