.netazure.net-core-3.0.net-core-3.1synthesizer

Azure text to speech convert SpeakTextAsync to valid NAudio wavestream


I am trying to use the Azure text to Speech service (Microsoft.CognitiveServices.Speech) to convert text to audio, and then convert the audio to another format using NAudio.

I already got the NAudio part working using an mp3 file. But I cannot get any output from SpeakTextAsync that will work with NAudio.

This is the code where I try to play the file using NAudio (as temperary test), but this doesn't play anything valid.

var waveStream = new RawSourceWaveStream(azureStream, new WaveFormat());
using (var waveOut = new WaveOutEvent())
{
    waveOut.Init(waveStream);
    Log.Logger.Debug("Playing sounds...");
    waveOut.Play();
    while (waveOut.PlaybackState == PlaybackState.Playing)
    {
        Thread.Sleep(1000);
    }
}

The 2 possible outputs I found are, but I am probably missing something important:

Option 1 (AudioDataStream):

using var synthesizer = new SpeechSynthesizer(_config, null);
using var result = await synthesizer.SpeakTextAsync(text);
switch (result.Reason)
{
    case ResultReason.SynthesizingAudioCompleted:
        Console.WriteLine($"Speech synthesized to speaker for text [{text}]");
        return AudioDataStream.FromResult(result);
    case ResultReason.Canceled:
    {
         var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
         Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

         if (cancellation.Reason == CancellationReason.Error)
         {
             Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
             Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
         }
         return null;
     }
     default:
         return null;
 }

Option 2 (PullAudioOutputStream):

PullAudioOutputStream stream = new PullAudioOutputStream();
AudioConfig config = AudioConfig.FromStreamOutput(stream);

using var synthesizer = new SpeechSynthesizer(_config, null);
using var result = await synthesizer.SpeakTextAsync(text);
switch (result.Reason)
{
    case ResultReason.SynthesizingAudioCompleted:
        Console.WriteLine($"Speech synthesized to speaker for text [{text}]");
        return stream;
    case ResultReason.Canceled:
    {
         var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
         Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

         if (cancellation.Reason == CancellationReason.Error)
         {
             Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
             Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
         }
         return null;
     }
     default:
         return null;
 }

So how to I convert the text to speech to a valid NAudio format?


Solution

  • Kevin,

    Why do you need NAudio for ? if it's for playback only, it's not necessary, the following line play the text out loud :

    await synthesizer.SpeakTextAsync(text);
    

    For any other reason, If you need the result of speech synthesis with NAudio.

    if (result.Reason == ResultReason.SynthesizingAudioCompleted)
    {
        using var stream = new MemoryStream(result.AudioData);
        using var reader = new WaveFileReader(stream);
        using var player = new WaveOutEvent();
    
        player.Init(reader);
        player.Play();
    
        while (player.PlaybackState == PlaybackState.Playing)
        {
    
            Thread.Sleep(500);
        }
    }