text-to-speechazure-cognitive-servicesdirect-line-botframework

How to use neural voices in Azure Direct Line Speech bot


I am trying to update the experimental DirectLineSpeech Echo Bot sample's Speak() method to use neural voices, but it doesn't seem to work.

Here's the code I am trying to make it work -

public IActivity Speak(string message)
{
    var activity = MessageFactory.Text(message);
    string body = @"<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
        <voice name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'>
        <mstts:express-as type='chat'>" + $"{message}" + "</mstts:express-as></voice></speak>";
    activity.Speak = body;
    return activity;
}

This is based on the recommendation provided in the SSML Guide

Here's the standard T2S for reference:

public IActivity Speak(string message)
{
    var activity = MessageFactory.Text(message);
    string body = @"<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
        <voice name='Microsoft Server Speech Text to Speech Voice (en-US, JessaRUS)'>" +
        $"{message}" + "</voice></speak>";
    activity.Speak = body;
    return activity;
}

Can someone help me understand how does it work or what am I doing wrong?

If it helps find any restrictions, I have deployed the bot as app service in F1 free tier in westus2 region.

Edit: Updated the code to use the full name ie. Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural) instead of the short name en-US-JessaNeural as suggested by Nicholas. But this doesn't seem to help either.


Solution

  • The Neural voice exact name is Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural). But the main thing is that you wanted to use a speaking style, using mstts:express-as.

    The thing is that you forgot to add the block declaring mstts namespace in the xml (xmlns:mstts='https://www.w3.org/2001/mstts'):

    "<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
        <voice name='en-US-JessaNeural'>
            <mstts:express-as type='chat'>" + $"{message}" + "</mstts:express-as>
        </voice>
    </speak>";
    

    Should be:

    "<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'>
        <voice name='en-US-JessaNeural'>
            <mstts:express-as type='chat'>" + $"{message}" + "</mstts:express-as>
        </voice>
    </speak>";