.netasp.net-web-apitext-to-speechspeechsynthesizer

No response to a HTTP Get request in WebAPI in .NET 4.5 while using SpeechSynthesis for converting text to speech


I'm trying to setup a simple web service using WebAPI. Here is what I have for code:

public class SpeakController : ApiController
    {
        //
        // api/speak

        public HttpResponseMessage Get(String textToConvert, String outputFile, string gender, string age = "Adult")
        {
            VoiceGender voiceGender = (VoiceGender)Enum.Parse(typeof(VoiceGender), gender);
            VoiceAge voiceAge = (VoiceAge)Enum.Parse(typeof(VoiceAge), age);

            using (SpeechSynthesizer synthesizer = new SpeechSynthesizer())
            {
                synthesizer.SelectVoiceByHints(voiceGender, voiceAge);
                synthesizer.SetOutputToWaveFile(outputFile, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
                synthesizer.Speak(textToConvert);
            }

            return Request.CreateResponse(HttpStatusCode.OK, new Response { HttpStatusCode = (int)HttpStatusCode.OK, Message = "Payload Accepted." });
        }
    }

The code is fairly straight forward and it is by no means production ready. But in my tests I have noticed the following occurs for any request to the controller:

I tried the same with Postman (a REST client for Chrome) and got the same result. Though I do want this to be a blocking call, in the interest of trying other things I modified synthesizer.Speak to synthesizer.SpeakAsync and encountered the same issue.

However when I test the snippets separately as shown below, the code works as expected.

Testing WebAPI call with speech section commented out:

public class SpeakController : ApiController
{
    //
    // api/speak

    public HttpResponseMessage Get(String textToConvert, String outputFile, string gender, string age = "Adult")
    {
        VoiceGender voiceGender = (VoiceGender)Enum.Parse(typeof(VoiceGender), gender);
        VoiceAge voiceAge = (VoiceAge)Enum.Parse(typeof(VoiceAge), age);

        //using (SpeechSynthesizer synthesizer = new SpeechSynthesizer())
        //{
        //  synthesizer.SelectVoiceByHints(voiceGender, voiceAge);
        //  synthesizer.SetOutputToWaveFile(outputFile, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
        //  synthesizer.Speak(textToConvert);
        //}

        return Request.CreateResponse(HttpStatusCode.OK, new Response { HttpStatusCode = (int)HttpStatusCode.OK, Message = "Payload Accepted." });
    }
}

Testing speech separately in a console application:

static string usageInfo = "Invalid or no input arguments!"
    + "\n\nUsage: initiatives \"text to speak\" c:\\path\\to\\generate.wav gender"
    + "\nGender:\n\tMale or \n\tFemale"
    + "\n";

static void Main(string[] args)
{
    if (args.Length != 3)
    {
        Console.WriteLine(usageInfo);
    }
    else
    {
        ConvertStringToSpeechWav(args[0], args[1], (VoiceGender)Enum.Parse(typeof(VoiceGender), args[2]));
    }

    Console.WriteLine("Press any key to continue...");
    Console.ReadLine();
}

static void ConvertStringToSpeechWav(String textToConvert, String pathToCreateWavFile, VoiceGender gender, VoiceAge age = VoiceAge.Adult)
{
    using (SpeechSynthesizer synthesizer = new SpeechSynthesizer())
    {
        synthesizer.SelectVoiceByHints(gender, age);
        synthesizer.SetOutputToWaveFile(pathToCreateWavFile, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
        synthesizer.Speak(textToConvert);
    }
}

WebAPI and SpeechSynthesis do not seem to play well together. Any help in figuring this out would be greatly appreciated.

Thanks!


Solution

  • I have no idea why this happens but running your SpeechSynthesizer in a separate thread seems to do the trick (incompatible threading model?). Here's how I've done it in the past.

    Based on: Ultra Fast Text to Speech (WAV -> MP3) in ASP.NET MVC

    public dynamic Post(dynamic req)
    {
        try 
        {
            string phrase = req["phrase"].Value;
    
            var stream = new MemoryStream();
            var t = new System.Threading.Thread(() =>
                {
                    using (var synth = new SpeechSynthesizer())
                    {
                        synth.SetOutputToWaveStream(stream);
                        synth.Speak(phrase);
                        synth.SetOutputToNull();
                    }
                });
    
            t.Start();
            t.Join();
    
            stream.Position = 0;
    
            var resp = new HttpResponseMessage(HttpStatusCode.OK);
            resp.Content = new StreamContent(stream);
    
            resp.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
            resp.Content.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("attachment");
            resp.Content.Headers.ContentDisposition.FileName = "phrase.wav";
    
            return resp;
        }
        catch
        {
            return new HttpResponseMessage(HttpStatusCode.InternalServerError);
        }
    }