javascript web-audio-api latency audiocontext

What is the difference between 'baseLatency' and 'outputLatency' when calling AudioBufferSourceNode.start()?

I'm working on a Javascript browser application that requires very precise timing of audio playback via the AudioBufferSourceNode.start() method. I'm unsure on how I should treat 'baseLatency' and 'outputLatency' to achieve my goal. I'll first describe my goal and then attempt to answer my question with my current understanding of these two latencies. However, I'm looking for validation / correction to my interpretation.

I'm generating an audio "beep" signal using an AudioBufferSourceNode in Javascript. I want this signal to be "heard" by a human listener at precisely N seconds in units of AudioContext.currentTime. Note that I'm not saying "played" but rather "heard". Thus I want to account for latencies such that the outputted audio lines up at exactly N seconds per human interpretation.

My current interpretations of 'baseLatency' and 'outputLatency' are as follows:

'baseLatency' is the amount of time required to pass an audio signal from Javascript to the OS audio buffer. So essentially if I were to issue an audio command in Javascript at time "now", it would be received by my OS sound buffer at "now + baseLatency". BUT - if I were to call AudioBufferSrcNode.start(future time), the fact that I'm scheduling audio to play at sufficient future time, this baseLatency is essentially neglected.
'outputLatency' is the amount of time for the OS sound buffer to be heard by the listener. That is, assuming the OS already has my signal and decided to play it - it will take 'outputLatency' until the sound comes through the speakers.

So here's my current strategy to achieve my goal using my interpretations: My JavaScript function should 'schedule' my audio to play at some point in the future to offset 'baseLatency' using the .start(when) method. Essentially if I were to schedule well in advance, the audio would get to the OS regardless of baseLatency. However, I would still need to adjust my 'when' time to account for outputLatency. Thus I should schedule my audio: .start(N - outputLatency).

In summary, I'm treating my baseLatency as a measurement of how early I need to issue my .start() method relative to N, and then I need to use outputLatency to adjust the value at which it is played (i.e. N - outputLatency)

Solution

In a related issue, Paul Adenot (Mozilla audio impl.) stated in 2022:

#2397 (comment) suggests that start(currentTime + baseLatency) should reliably play a sound at the indicated time, but that's not true in all browsers. If it were, playing a click sound effect would be easy:

The linked message is not correct. baseLatency is useful to know if the Web Audio API implementation buffers internaly. outputLatency is useful to understand the latency induced by the operating system / hardware. Firefox doesn't buffer audio (ever), so baseLatency is zero. The graph processing is directly serviced from the real-time audio callback the OS calls. Summing the two numbers allows knowing the total latency (for example for syncing visuals).

^{Note that since then, it seems Firefox did implement buffering too, at least baseLatency is not 0 anymore.}

So regarding your understanding, no, start() does not account for the baseLatency, and you'd have to account for it yourself doing start(currentTime + delay - (baseLatency + outputLatency)).

Now, the issue goes further in explaining a discrepancy between implementations, where Chrome does not lock the AudioContext's graph to the JS execution, while Firefox and Safari do, in somehow different ways...

const ctx = new AudioContext();
const createOsc = (freq) => {
  const node = new OscillatorNode(ctx);
  node.frequency.value = freq;
  node.connect(ctx.destination);
  return node;
}
document.querySelector("button").onclick = e => {
  const osc1 = createOsc(440);
  const osc2 = createOsc(1000);
  const p1 = performance.now();
  const beginTime = ctx.currentTime;
  osc1.start(beginTime);
  osc1.stop(beginTime + 1);
  while (performance.now() - p1 < 2000) {
  }
  console.log(ctx.currentTime - beginTime);
  osc2.start(ctx.currentTime);
  osc2.stop(ctx.currentTime + 1);
}

<button>start test</button>

If you run the above snippet in Chrome, you have a 1s beep at 440Hz, then an other one, about a second later at 1KHz, and the console logs a number close to 2.
In Firefox the console logs 0 and you hear nothing.
In Safari, the console logs 0 and you hear both beeps simultaneously during a second.

This means that if you plan on having multiple nodes synced, you still need to account for JS execution between your calls, i.e. you need to be sure that your JS call will take less than baseLatency + outputLatency, or that you add enough delay to cover it.

And a final note that while baseLatency + outputLatency will give you when the sound will be heard, if your goal is to sync with a video signal, I don't think there is a way to let you know when that video signal will be displayed.