google-text-to-speech

Google TTS SSML introduces weird noise


I've used Google TTS to generate an audio. Here is the input SSML:

<break time="6s"/>AIRDASS<break time="1s"/> intorduction.<break time="9s"/>In this lesson, the following topics will be covered:<break time="1s"/> introduction of the AIRDASS system, <break time="1s"/>description of the main characteristics, <break time="1s"/>description of the main functionalities.

As a result, I get a very annoying high frequency noise (like a kind of "whistle") where there should be silence:

enter image description here

Here are my settings:

{
    audioConfig: {
        audioEncoding: "LINEAR16",
        pitch: 0,
        speakingRate: 1
    },
    input: {
        ssml
    },
    voice: {
        languageCode: "en-GB",
        name: "en-GB-Neural2-D"
    }
}

Is there any other settings I should add to prevent noise?


Solution

  • I have exactly the same issue.

    Turns out when using any voice option starts with Neural2 will have this problem, sometimes the quality of voice is also bad.

    By using standard voice option will give you stable result.

    {
        audioConfig: {
            audioEncoding: "mp3"
        },
        input: {
            ssml
        },
        voice: {
            languageCode: "en-US",
            name: "en-US-Standard-B"
        }
    }