Web Speech API specification says:
text attribute
This attribute specifies the text to be synthesized and spoken for this utterance. This may be either plain text or a complete, well-formed SSML document. For speech synthesis engines that do not support SSML, or only support certain tags, the user agent or speech engine must strip away the tags they do not support and speak the text.
It does not provide an example of using text
with an SSML document.
I tried the following in Chrome 33:
var msg = new SpeechSynthesisUtterance();
msg.text = '<?xml version="1.0"?>\r\n<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">ABCD</speak>';
speechSynthesis.speak(msg);
It did not work -- the voice attempted to narrate the XML tags. Is this code valid?
Do I have to provide a XMLDocument
object instead?
I am trying to understand whether Chrome violates the specification (which should be reported as a bug), or whether my code is invalid.
In Chrome 46, the XML is being interpreted properly as an XML document, on Windows, when the language is set to en
; however, I see no evidence that the tags are actually doing anything. I heard no difference between the <emphasis>
and non-<emphasis>
versions of this SSML:
var msg = new SpeechSynthesisUtterance();
msg.text = '<?xml version="1.0"?>\r\n<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><emphasis>Welcome</emphasis> to the Bird Seed Emporium. Welcome to the Bird Seed Emporium.</speak>';
msg.lang = 'en';
speechSynthesis.speak(msg);
The <phoneme>
tag was also completely ignored, which made my attempt to speak IPA fail.
var msg = new SpeechSynthesisUtterance();
msg.text='<?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> Pavlova is a meringue-based dessert named after the Russian ballerina Anna Pavlova. It is a meringue cake with a crisp crust and soft, light inside, usually topped with fruit and, optionally, whipped cream. The name is pronounced <phoneme alphabet="ipa" ph="pævˈloʊvə">...</phoneme> or <phoneme alphabet="ipa" ph="pɑːvˈloʊvə">...</phoneme>, unlike the name of the dancer, which was <phoneme alphabet="ipa" ph="ˈpɑːvləvə">...</phoneme> </speak>';
msg.lang = 'en';
speechSynthesis.speak(msg);
This is despite the fact that the Microsoft speech API does handle SSML correctly. Here is a C# snippet, suitable for use in LinqPad:
var str = "Pavlova is a meringue-based dessert named after the Russian ballerina Anna Pavlova. It is a meringue cake with a crisp crust and soft, light inside, usually topped with fruit and, optionally, whipped cream. The name is pronounced /pævˈloʊvə/ or /pɑːvˈloʊvə/, unlike the name of the dancer, which was /ˈpɑːvləvə/.";
var regex = new Regex("/([^/]+)/");
if (regex.IsMatch(str))
{
str = regex.Replace(str, "<phoneme alphabet=\"ipa\" ph=\"$1\">word</phoneme>");
str.Dump();
}
SpeechSynthesizer synth = new SpeechSynthesizer();
PromptBuilder pb = new PromptBuilder();
pb.AppendSsmlMarkup(str);
synth.Speak(pb);