javakotlinaudio-recordingjavax.sound.sampled

Sudden delay while recording audio over long time periods inside the JVM


I'm implementing an application which records and analyzes audio in real time (or at least as close to real time as possible), using the JDK Version 8 Update 201. While performing a test which simulates typical use cases of the application, I noticed that after several hours of recording audio continuously, a sudden delay of somewhere between one and two seconds was introduced. Up until this point there was no noticeable delay. It was only after this critical point of recording for several hours when this delay started to occur.

What I've tried so far

To check if my code for timing the recording of the audio samples is wrong, I commented out everything related to timing. This left me essentially with this update loop which fetches audio samples as soon as they are ready (Note: Kotlin code):

while (!isInterrupted) {
    val audioData = read(sampleSize, false)
    listener.audioFrameCaptured(audioData)
}

This is my read method:

fun read(samples: Int, buffered: Boolean = true): AudioData {
    //Allocate a byte array in which the read audio samples will be stored.
    val bytesToRead = samples * format.frameSize
    val data = ByteArray(bytesToRead)

    //Calculate the maximum amount of bytes to read during each iteration.
    val bufferSize = (line.bufferSize / BUFFER_SIZE_DIVIDEND / format.frameSize).roundToInt() * format.frameSize
    val maxBytesPerCycle = if (buffered) bufferSize else bytesToRead

    //Read the audio data in one or multiple iterations.
    var bytesRead = 0
    while (bytesRead < bytesToRead) {
        bytesRead += (line as TargetDataLine).read(data, bytesRead, min(maxBytesPerCycle, bytesToRead - bytesRead))
    }

    return AudioData(data, format)
}

However, even without any timing from my side the problem was not resolved. Therefore, I went on to experiment a bit and let the application run using different audio formats, which lead to very confusing results (I'm going to use a PCM signed 16 bit stereo audio format with little endian and a sample rate of 44100.0 Hz as default, unless specified otherwise):

  1. The critical amount of time that has to pass before the delay appears seems to be different depending on the machine used. On my Windows 10 desktop PC it is somewhere between 6.5 and 7 hours. On my laptop (also using Windows 10) however, it is somewhere between 4 and 5 hours for the same audio format.
  2. The amount of audio channels used seems to have an effect. If I change the amount of channels from stereo to mono, the time before the delay starts to appear is doubled to somewhere between 13 and 13.5 hours on my desktop.
  3. Decreasing the sample size from 16 bits to 8 bits also results in a doubling of the time before the delay starts to appear. Somewhere between 13 and 13.5 hours on my desktop.
  4. Changing the byte order from little endian to big endian has no effect.
  5. Switching from stereomix to a physical microphone has no effect either.
  6. I tried opening the line using different buffer sizes (1024, 2048 and 3072 sample frames) as well as its default buffer size. This also didn't change anything.
  7. Flushing the TargetDataLine after the delay has started to occur results in all bytes being zero for approximately one to two seconds. After this I get non-zero values again. The delay, however, is still there. If I flush the line before the critical point, I don't get those zero-bytes.
  8. Stopping and restarting the TargetDataLine after the delay appeared also does not change anything.
  9. Closing and reopening the TargetDataLine, however, does get rid of the delay until it reappears after several hours from there on.
  10. Automatically flushing the TargetDataLines internal buffer every ten minutes does not help to resolve the issue. Therefore, a buffer overflow in the internal buffer does not seem to be the cause.
  11. Using a parallel garbage collector to avoid application freezes also does not help.
  12. The used sample rate seems to be important. If I double the sample rate to 88200 Hertz, the delay starts occurring somewhere between 3 and 3.5 hours of runtime.
  13. If I let it run under Linux using my "default" audio format, it still runs fine after about 9 hours of runtime.

Conclusions that I've drawn:

These results let me come to the conclusion that the time for which I can record audio before this issue starts to happen is dependent on the machine on which the application is run and dependent on the byte rate (i.e. frame size and sample rate) of the audio format. This seems to hold true (although I can't completely confirm this as of now) because if I combine the changes made in 2 and 3, I would assume that I can record audio samples for four times as long (which would be somewhere between 26 and 27 hours) as when using my "default" audio format before the delay starts to appear. As I didn't find the time to let the application run for this long yet, I can only tell that it did run fine for about 15 hours before I had to stop it due to time constraints on my side. So, this hypothesis is still to be confirmed or denied.

According to the result of bullet point 13, it seems like the whole issue only appears when using Windows. Therefore, I think that it might be a bug in the platform specific parts of the javax.sound.sampled API.

Even though I think I might have found a way to change when this issue starts to happen, I'm not satisfied with the result. I could periodically close and reopen the line to avoid the problem from starting to appear at all. However, doing this would result in some arbitrary small amount of time where I wouldn't be able to capture audio samples. Furthermore, the Javadoc states that some lines can't be reopened at all after being closed. Therefore, this is not a good solution in my case.

Ideally, this whole issue shouldn't be happening at all. Is there something I am completely missing or am I experiencing limitations of what is possible with the javax.sound.sampled API? How can I get rid of this issue at all?

Edit: By suggestion of Xtreme Biker and gidds I created a small example application. You can find it inside this Github repository.


Solution

  • I have (a rather) vast experience with Java audio interfacing. Here are a few points that may be useful in guiding you towards a proper solution:

    1. It's not a matter of JVM version - the java audio system have barely been upgraded since Java 1.3 or 1.5
    2. The java audio system is a poor-man's wrapper around whatever audio interface API the operating system has to offer. In linux it's the Pulseaudio library, For windows, it's the direct show audio API (if I'm not mistaken about the latter).
    3. Again, the audio system API is kind of a legacy API - some of the features are not working or not implemented, other behaviors are straight out weird, as they are dependent on an obsolete design (I can provide examples, if required).
    4. It's not a matter of Garbage Collection - If your definition of "delay" is what I understand it to be (audio data is delayed by 1-2 seconds, meaning you start hearing stuff 1-2 seconds later), well, the garbage collector cannot cause blank data to magically be captured by the target data line and then append data as usual in an 2 seconds worth byte offset.
    5. What's most likely happening here is either the hardware or driver providing you with 2 seconds worth of garbled data at some point, and then, streams the rest of the data as usual, resulting in the "delay" you are experiencing.
    6. The fact that it works perfectly on linux means it's not a hardware issue, but rather a driver related one.
    7. To affirm that suspicion, you can try capturing audio via FFmpeg for the same duration and see if the issue is reproduced.
    8. If you are using specialized audio capturing hardware, better approach your hardware manufacturer and inquire him about the issue you are facing on windows.
    9. In any case, when writing an audio capturing application from scratch I'd strongly suggest keeping away from the Java audio-system if possible. It's nice for POCs, but it's an un-maintained legacy API. JNA is always a viable option (I've used it in Linux with ALSA/Pulse-audio to control audio hardware attributes the Java audio system could not alter), so you could look for audio capturing examples in C++ for windows and translate them to Java. It'll give you fine grain control over audio capture devices, much more than what the JVM provide OOTB. If you want to have a look at a living/breathing usable JNA example, check out my JNA AAC encoder project.
    10. Again, if you use special capturing harwdare, there's a good chance the manufacturer already provides it's own low-level C api for interfacing with the hardware, and you should consider having a look at it as well.
    11. If that's not the case, maybe you and your company/client should consider using specialized capturing hardware (doesn't have to be that expensive).