javascript node.js audio web-audio-api alsa

NodeJS: Capturing a stereo PCM wave stream into mono AudioBuffer

I'm recording audio from nodejs using node-microphone (which is just a javascript interface for arecord), and want to store the stream chunks in an AudioBuffer using web-audio-api (which is a nodejs implementation of the Web Audio API).

My audio source has two channels while my AudioBuffer has only one (on purpose).

This is my working configuration for recording audio with arecord through my USB sound card (I'm using a Raspberry pi 3 running on Raspbian buster):

arecord -D hw:1,0 -c 2 -f S16_LE -r 44100

Running this command with an output path and playing the resulting wav file with aplay works just fine. So node-microphone is able to record audio with these parameters, and at the end I get a nodejs readable stream flowing wave data.

But

I'm struggling doing the bridge from the stream chunks (Buffer instances) to the AudioBuffer. More precisely; I'm not sure of the format of the incoming data, not sure of the destination format, and not sure of how I would do the conversion anyway:

The stream chunks are Buffers so they also are Uint8Arrays. Regarding my configuration, I guess they are binary representations of 16 bits signed integers (little endian, I don't know what it means).

The AudioBuffer holds multiple buffers (one per channel, so only one in my case) that I can access as Float32Arrays by calling AudioBuffer.prototype.getChannelData(). MDN also says:

The buffer contains data in the following format: non-interleaved IEEE754 32-bit linear PCM with a nominal range between -1 and +1, that is, 32bits floating point buffer, with each samples between -1.0 and 1.0.

The point is to find what I have to extract from the incoming Buffers and how I should transform it so it's suitable for the Float32Array destination (and remains valid wave data), knowing that the audio source is stereo and the AudioBuffer isn't.

My best contender so far was the Buffer.prototype.readFloatLE() method whose name looks like it would solve my problem, but this wasn't a success (just noise).

My first try (before doing research) was just to naively copy buffer data to Float32Array and interleaving indexes to handle stereo/mono conversion. Obviously it mostly produced noise but I could hear some of the sound I recorded (incredibly distorted but surely present) so I guess I should mention that.

This is a simplified version of my naive try (I'm aware this is not meant to work well, I just include it in my question as a base of discussion):

import { AudioBuffer } from 'web-audio-api'
import Microphone from 'node-microphone'

const rate = 44100
const channels = 2 // Number of source channels

const microphone = new Microphone({ // These parameters result to the arecord command above
  channels,
  rate,
  device: 'hw:1,0',
  bitwidth: 16,
  endian: 'little',
  encoding: 'signed-integer'
})

const audioBuffer = new AudioBuffer(
  1, // 1 channel
  30 * rate, // 30 seconds buffer
  rate
})

const chunks = []
const data = audioBuffer.getChannelData(0) // This is the Float32Array
const stream = microphone.startRecording()

setTimeout(() => microphone.stopRecording(), 5000) // Recording for 5 seconds

stream.on('data', chunk => chunks.push(chunk))

stream.on('close', () => {
  chunks.reduce((offset, chunk) => {
    for (var index = 0; index < chunk.length; index += channels) {
      let value = 0

      for (var channel = 0; channel < channels; channel++) {
        value += chunk[index + channel]
      }

      data[(offset + index) / channels] = value / channels // Average value from the two channels
    }

    return offset + chunk.length // Since data comes as chunks, this offsets AudioBuffer's index
  }, 0)
})

I would be really grateful if you could help :)

Solution

So the input stereo signal is coming as 16 bits signed integers, interleaving left and right channels, meaning that the corresponding buffers (8 bits unsigned integers) have this format for a single stereo sample:

[LEFT ] 8 bits (LSB)
[LEFT ] 8 bits (MSB)
[RIGHT] 8 bits (LSB)
[RIGHT] 8 bits (MSB)

Since arecord is configured with little endian format, the Least Significant Byte (LSB) comes first, and the Most Significant Byte (MSB) comes next.

The AudioBuffer single channel buffer, represented by a Float32Array, expects values between -1 and 1 (one value per sample).

So to map values from the input Buffer to the destination Float32Array, I had to use the Buffer.prototype.readInt16LE(offset) method incrementing the bytes offset parameter by 4 each sample (2 left bytes + 2 right bytes = 4 bytes), and interpolating input values from range [-32768;+32768] (16 bits signed integer range) to range [-1;+1]:

import { AudioBuffer } from 'web-audio-api'
import Microphone from 'node-microphone'

const rate = 44100
const channels = 2 // 2 input channels

const microphone = new Microphone({
  channels,
  rate,
  device: 'hw:1,0',
  bitwidth: 16,
  endian: 'little',
  encoding: 'signed-integer'
})

const audioBuffer = new AudioBuffer(
  1, // 1 channel
  30 * rate, // 30 seconds buffer
  rate
})

const chunks = []
const data = audioBuffer.getChannelData(0)
const stream = microphone.startRecording()

setTimeout(() => microphone.stopRecording(), 5000) // Recording for 5 seconds

stream.on('data', chunk => chunks.push(chunk))

stream.on('close', () => {
  chunks.reduce((offset, chunk) => {
    for (var index = 0; index < chunk.length; index += channels + 2) {
      let value = 0

      for (var channel = 0; channel < channels; channel++) {
        // Iterates through input channels and adds the values
        // of all the channel so we can compute the
        // average value later to reduce them into a mono signal

        // Multiplies the channel index by 2 because
        // there are 2 bytes per channel sample

        value += chunk.readInt16LE(index + channel * 2)
      }

      // Interpolates index according to the number of input channels
      // (also divides it by 2 because there are 2 bytes per channel sample)
      // and computes average value as well as the interpolation
      // from range [-32768;+32768] to range [-1;+1]
      data[(offset + index) / channels / 2] = value / channels / 32768
    }

    return offset + chunk.length
  }, 0)
})