node.jsgoogle-cloud-platformgoogle-text-to-speech

Google Cloud TTS synthesizeLongAudio


I'm trying to use the TextToSpeechLongAudioSynthesizeClient in nodejs, but am getting an error that says "Request contains an invalid argument" every time, without additional details. Does anyone have an example of how to correctly use the TextToSpeechLongAudioSynthesizeClient?

const textToSpeech = require("@google-cloud/text-to-speech").v1;
const fs = require("fs");
const util = require("util");
const ttsClient = new textToSpeech.TextToSpeechLongAudioSynthesizeClient();

const [operation] = await ttsClient.synthesizeLongAudio({
  input: { text: "hello world" },
  voice: { languageCode: "en-US", name: "en-US-Neural2-J" },
  audioConfig: { audioEncoding: "MP3" },
});
const [response] = await operation.promise();
const writeFile = util.promisify(fs.writeFile);
await writeFile("output.mp3", response.audioContent, "binary");

Solution

  • As of my writing, the synthesizeLongAudio requires you to have a bucket for storing created audio files. You can't expect an audio response from your code. It also requires a project id as well.

    Moreover, it only accepts LINEAR16 audioEncoding, so you can't have a MP3 as an audio output either. So we need to make some changes to your code:

    const [operation] = await ttsClient.synthesizeLongAudio({
      input: { text: "hello world" },
      voice: { languageCode: "en-US", name: "en-US-Neural2-J" },
      audioConfig: { audioEncoding: "LINEAR16" },
      parent: 'projects/XXXXX/locations/global',
      outputGcsUri: 'gs://xxxxx-d36.appspot.com/audio/test.wav'
    });

    When you now make a request with this configuration, the wave file will be created under outputGcsUri folder. Keep in mind that a 5 minute WAV file might cost you about 15 MB. And of course, make sure you have created a bucket and gave permissions to your client.