twilioaudio-streamingpcmspeakermu-law

Speaker outputs bursts of static when running Node.js WebSocket server with “audio/x-mulaw” data from Twilio


I'm trying to create a Node.js WebSocket server that receives audio data in the form of a base64-encoded string from Twilio. The decoded audio data is then written to the speaker using the write() method. Twilio says that it sends data in "audio/x-mulaw" form in base64.

However, when I run the code, the speaker outputs bursts of static instead of the expected audio. I'm not sure what's causing this issue. The busts of static do correspond with my speaking into the mic but it is not at all recognizable.

Here's my code:

import { WebSocketServer } from 'ws';
import Speaker from 'speaker';
import alawmulaw from 'alawmulaw';

// Create a new Speaker instance with the specified format
const speaker = new Speaker();

const wss = new WebSocketServer({ port: 5000 });

wss.on('connection', function connection(ws) {
  ws.on('message', function message(data) {
    let obj = JSON.parse(data);

    if (obj.event === "media") {
        let buff = Buffer.from(obj.media.payload, 'base64');

        let PCM = Buffer.from(alawmulaw.mulaw.decode(buff));

        speaker.write(PCM);
    }
  });
});

I'm relatively confident that it is an issue with the encoding but I have tried various configurations and nothing has worked so far. I would greatly appreciate it if anyone could share some ideas on how to work through this. Thanks!

Twilio Docs Example Server Twilio Stream Docs


Solution

  • Configure your speaker like this.

    const speaker = new Speaker({
        float: false,           // use PCM data
        signed: true,           // 16-bit signed PCM
        bitDepth: 16,           // match the bit depth to your audio data
        channels: 1,            // mono audio
        sampleRate: 8000,       // match this to your audio data's sample rate
        samplesPerFrame: 1,   // default value
    });
    

    also decode each track seperatly.