actions-on-googlessml

Unwanted background noise in the German Google Assistant SSML output


I have noticed that there is background noise when I output single digits followed by a <break>. I am using the German female voice 1. I have the following SSML markup to reproduce this behaviour:

<speak>
   <prosody rate="medium">
      <s>
         <say-as interpret-as="cardinal">0</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">1</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">2</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">3</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">4</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">5</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">6</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">7</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">8</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">9</say-as><break time="1250ms"/>
         <say-as interpret-as="cardinal">0</say-as>
      </s>
   </prosody>
</speak>

As well I would like to provide a Link to mp3 (generated with the TTS Simulator, German, voice female 1). You can clearly hear the noise especially after the digits 0, 2, 3, 4, 6, 7. This effect appears to occur only if there is a <break> after a <say-as interpret-as="cardinal">.

I would expect that there is no background noise at all with such SSML markup.

I use the markup above to tell the user a telephone number as <say-as interpret-as="telephone">01234567890</say-as> spells out the digits way too fast.


Solution

  • Try the following:

    <speak>
       <prosody rate="medium"> 
             <s><say-as interpret-as="cardinal">0</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">1</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">2</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">3</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">4</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">5</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">6</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">7</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">8</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">9</say-as></s><break time="1250ms"/>
             <s><say-as interpret-as="cardinal">0</say-as></s>
       </prosody>
    </speak>