javascriptaudioweb-audio-apiaudiocontextaudiobuffer

While streaming audio chunk by chunk it has pause interval between chunks


I am currently working on implementing a streaming audio feature, and I've encountered an issue related to merging audio buffers using the AudioContext. My goal is to fetch 5-second audio chunks and play them to create a continuous audio stream.

Here's what I've done so far:

  1. I fetch the first 5-second audio chunk, decode it, and store it in an varriable as type of AudioBuffer.
  2. When the user clicks the "Play" button, I fetch other chunks and merge it to first AudioBuffer where I re-stored in same variable.

The problem arises when transitioning from one chunk to another during playback. There is a noticeable pause gap between the chunks.

I suspect that this pause gap is due to the process of merging subsequent audio chunks with the initial AudioBuffer. As the playback progresses from, for instance, 00:04 to 00:05, the pause becomes evident.

How can I effectively merge audio buffers in a way that eliminates or minimizes these pause gaps between chunks? I want to achieve a smooth playing of this audio

also here is demo example of this issue, click play and you will notice that gaps

import audios, { preBuffer } from "./data";
import { fetchDecode, mergeAudioBuffers } from "./utils";

const playButton = document.getElementById("play") as HTMLButtonElement;

let ctx: AudioContext;
let combinedAudioBuffers: AudioBuffer;
let source: AudioBufferSourceNode;
let startTime = 0;
let playbackTime = 0;

// decode first buffer before starting streaming
window.onload = async () => {
  ctx = new AudioContext();
  const arrayBuffer: ArrayBuffer = await fetchDecode(preBuffer);
  const audioBuffer: AudioBuffer = await ctx.decodeAudioData(arrayBuffer);
  combinedAudioBuffers = audioBuffer;
  const src: AudioBufferSourceNode = ctx.createBufferSource();
  src.buffer = audioBuffer;
  src.connect(ctx.destination);
  source = src;
};

playButton.addEventListener("click", async () => {
  startTime = Date.now();
  source.start(0);
  playButton.innerHTML = "Playing";
  playButton.disabled = true;

  // decode all the url chunks add to AudioBuffer and continue playing
  for (let audio of audios) {
    const arraybuffer = await fetchDecode(audio);
    const decodeBuffer = await ctx.decodeAudioData(arraybuffer);
    const mergeTwoBuffers = mergeAudioBuffers(
      ctx,
      combinedAudioBuffers,
      decodeBuffer
    );
    combinedAudioBuffers = mergeTwoBuffers;
    playbackTime = Date.now();
    let playback = (playbackTime - startTime) / 1000;

    source.stop();

    source = ctx.createBufferSource();
    source.buffer = combinedAudioBuffers;
    source.connect(ctx.destination);

    source.start(0, playback);
  }
});


Solution

  • (I'm assuming your merge code is good... you didn't show it to us, so we don't know either way...)

    Generally, you can't do this sort of split-and-merge with lossy codecs, at least without some cooperation on the encoder end.

    You're using MP3, which has the concept of a 'frame' which encodes 576 audio samples. So, you at least need to split on a frame boundary, not an arbitrary amount of time.

    It's worse than that though because a frame can depend on a chunk of data in another frame. This is the bit reservoir, and it's a sort of hack to use some more bits for more complex passages and less bits for the easy stuff. Sort of a VBR within a CBR stream. In any case, it means that you can't correctly decode an arbitrary frame by itself. You potentially need surrounding frames to do that.

    Additionally, a normal MP3 stream doesn't have any way to signal the decoder to delay, so gapless playback of MP3 is not possible without some modifications. The encoders normally insert a couple frames of silence to allow for initializing the decoder.

    So, all that being said: