I am currently working on implementing a streaming audio feature, and I've encountered an issue related to merging audio buffers using the AudioContext
. My goal is to fetch 5-second audio chunks and play them to create a continuous audio stream.
Here's what I've done so far:
AudioBuffer
.AudioBuffer
where I re-stored in same variable.The problem arises when transitioning from one chunk to another during playback. There is a noticeable pause gap between the chunks.
I suspect that this pause gap is due to the process of merging subsequent audio chunks with the initial AudioBuffer
. As the playback progresses from, for instance, 00:04 to 00:05, the pause becomes evident.
How can I effectively merge audio buffers in a way that eliminates or minimizes these pause gaps between chunks? I want to achieve a smooth playing of this audio
also here is demo example of this issue, click play and you will notice that gaps
import audios, { preBuffer } from "./data";
import { fetchDecode, mergeAudioBuffers } from "./utils";
const playButton = document.getElementById("play") as HTMLButtonElement;
let ctx: AudioContext;
let combinedAudioBuffers: AudioBuffer;
let source: AudioBufferSourceNode;
let startTime = 0;
let playbackTime = 0;
// decode first buffer before starting streaming
window.onload = async () => {
ctx = new AudioContext();
const arrayBuffer: ArrayBuffer = await fetchDecode(preBuffer);
const audioBuffer: AudioBuffer = await ctx.decodeAudioData(arrayBuffer);
combinedAudioBuffers = audioBuffer;
const src: AudioBufferSourceNode = ctx.createBufferSource();
src.buffer = audioBuffer;
src.connect(ctx.destination);
source = src;
};
playButton.addEventListener("click", async () => {
startTime = Date.now();
source.start(0);
playButton.innerHTML = "Playing";
playButton.disabled = true;
// decode all the url chunks add to AudioBuffer and continue playing
for (let audio of audios) {
const arraybuffer = await fetchDecode(audio);
const decodeBuffer = await ctx.decodeAudioData(arraybuffer);
const mergeTwoBuffers = mergeAudioBuffers(
ctx,
combinedAudioBuffers,
decodeBuffer
);
combinedAudioBuffers = mergeTwoBuffers;
playbackTime = Date.now();
let playback = (playbackTime - startTime) / 1000;
source.stop();
source = ctx.createBufferSource();
source.buffer = combinedAudioBuffers;
source.connect(ctx.destination);
source.start(0, playback);
}
});
(I'm assuming your merge code is good... you didn't show it to us, so we don't know either way...)
Generally, you can't do this sort of split-and-merge with lossy codecs, at least without some cooperation on the encoder end.
You're using MP3, which has the concept of a 'frame' which encodes 576 audio samples. So, you at least need to split on a frame boundary, not an arbitrary amount of time.
It's worse than that though because a frame can depend on a chunk of data in another frame. This is the bit reservoir, and it's a sort of hack to use some more bits for more complex passages and less bits for the easy stuff. Sort of a VBR within a CBR stream. In any case, it means that you can't correctly decode an arbitrary frame by itself. You potentially need surrounding frames to do that.
Additionally, a normal MP3 stream doesn't have any way to signal the decoder to delay, so gapless playback of MP3 is not possible without some modifications. The encoders normally insert a couple frames of silence to allow for initializing the decoder.
So, all that being said:
Are you actually sure you need to do this chunked? The browsers are good at streaming on their own. Even if you need to do some tweaking of the stream, you can use MediaSource Extensions.
If you must use chunked for some reason, consider re-using HLS. It's a well-implemented standard that generally uses AAC in MP4/ISOBMFF files for audio. Then you don't have to re-implement any of this, neither on the encoding or decoding side.