aws-lambdatext-to-speechamazon-polly

Using PCM format of AWS Polly


I am trying to use AWS Polly (for TTS) using JavaScript SDK from AWS lambda (which is exposed through a REST API using API gateway). There is no trouble in getting the PCM output. Here is a call flow in brief.

.NET application --> REST API (API gateway) --> AWS Lambda (JS SDK) --> AWS Polly

The .NET application (am using POSTMAN too for testing) gets an audio stream buffer in following format.

{"type":"Buffer","data":[255,255,0,0,0,0,255,255,255,255,0,0,0,0,0,0,255,255,255,255,0,0,0,0,255,255,255,255,255,255,255,255,0,0,255,255,255,255,0,0,0,0,255,255,255,255,0,0,255,255,255, more such data]

Now I don't know how to convert it back to raw PCM. I would like it send this data back as raw PCM but unable to find a way to do it. I also cannot understand why AWS would send data back in such a format. Using there console, one can get audio in raw PCM format (which I can then feed to Audacity), but not so simple with SDK. Or am I missing something really basic?

Any suggestions/tips on this? Thanks.


Solution

  • As Michael mentioned (in the comment), sending the response from Polly back causes the stream to turn into a JSON object. Encoding the received buffer from Polly in base64 fixes this. Here's what code sample now looks like -

    polly.synthesizeSpeech(params, function(err, data) {
        if (err) console.log(err, err.stack); // an error occurred
        else     console.log(data);           // successful response
    
        //old code
        //callback(null, data.Audiostream); //this converts buffer to JSON obj
        //use below instead
        if (data && data.AudioStream instanceof Buffer) {
    
            var buf = data.AudioStream.toString('base64');
            callback(null, buf);
        }
    });
    

    PS: I am using AWS SDK on AWS lambda