iosswiftcore-audioaudiounitaudiotoolbox

Low latency audio output problems on iOS (aka How to beat AUAudioUnit sampleRate, maximumFramesToRender, and ioBufferDuration into submission)


Okay, I'm clearly missing some important piece here. I'm trying to do low-latency audio across the network, and my fundamental frames are 10ms. I expected this to be no problem. My target phone is an iPhone X speakers--so my hardware sample rate should be locked to 48000Hz. I'm requesting 10ms which is a nice even divisor and should be 480, 960, 1920 or 3840 depending upon how you want to slice frames/samples/bytes.

Yet, for the life of me, I absolute cannot get iOS to do anything I regard as sane. I get 10.667ms buffer duration which is ludicrous--iOS is going out of it's way to give me buffer sizes that aren't integer multiples of the sampleRate. Even worse, the frame is sightly LONG which means that I have to absorb not one but two packets of latency in order to be able to fill that buffer. I can't get maximumFrameToRender to change at all, and the system is returning 0 as my sample rate even though it quite plainly is rendering at 48000Hz.

I'm clearly missing something important--what is it? Did I forget to disconnect/connect something in order to get a direct hardware mapping? (My format is 1 which pcmFormatFloat32--I would expect pcmFormatInt16 or pcmFormatInt32 for mapping directly to hardware so something in the OS is probably getting in the way) Pointers are appreciated and I'm happy to go read more. Or is AUAudioUnit simply half-baked and I need to go backward to older, more useful APIs? Or did I completely miss the plot and low-latency audio folks use a whole different set of audio management functions?

Thanks for the help--it's much appreciated.

Output from code:

2019-11-07 23:28:29.782786-0800 latencytest[3770:50382] Ready to receive user events
2019-11-07 23:28:34.727478-0800 latencytest[3770:50382] Start button pressed
2019-11-07 23:28:34.727745-0800 latencytest[3770:50382] Launching auxiliary thread
2019-11-07 23:28:34.729278-0800 latencytest[3770:50445] Thread main started
2019-11-07 23:28:35.006005-0800 latencytest[3770:50445] Sample rate: 0
2019-11-07 23:28:35.016935-0800 latencytest[3770:50445] Buffer duration: 0.010667
2019-11-07 23:28:35.016970-0800 latencytest[3770:50445] Number of output busses: 2
2019-11-07 23:28:35.016989-0800 latencytest[3770:50445] Max frames: 4096
2019-11-07 23:28:35.017010-0800 latencytest[3770:50445] Can perform output: 1
2019-11-07 23:28:35.017023-0800 latencytest[3770:50445] Output Enabled: 1
2019-11-07 23:28:35.017743-0800 latencytest[3770:50445] Bus channels: 2
2019-11-07 23:28:35.017864-0800 latencytest[3770:50445] Bus format: 1
2019-11-07 23:28:35.017962-0800 latencytest[3770:50445] Bus rate: 0
2019-11-07 23:28:35.018039-0800 latencytest[3770:50445] Sleeping 0
2019-11-07 23:28:35.018056-0800 latencytest[3770:50445] Buffer count: 2 4096
2019-11-07 23:28:36.023220-0800 latencytest[3770:50445] Sleeping 1
2019-11-07 23:28:36.023400-0800 latencytest[3770:50445] Buffer count: 190 389120
2019-11-07 23:28:37.028610-0800 latencytest[3770:50445] Sleeping 2
2019-11-07 23:28:37.028790-0800 latencytest[3770:50445] Buffer count: 378 774144
2019-11-07 23:28:38.033983-0800 latencytest[3770:50445] Sleeping 3
2019-11-07 23:28:38.034142-0800 latencytest[3770:50445] Buffer count: 566 1159168
2019-11-07 23:28:39.039333-0800 latencytest[3770:50445] Sleeping 4
2019-11-07 23:28:39.039534-0800 latencytest[3770:50445] Buffer count: 756 1548288
2019-11-07 23:28:40.041787-0800 latencytest[3770:50445] Sleeping 5
2019-11-07 23:28:40.041943-0800 latencytest[3770:50445] Buffer count: 944 1933312
2019-11-07 23:28:41.042878-0800 latencytest[3770:50445] Sleeping 6
2019-11-07 23:28:41.043037-0800 latencytest[3770:50445] Buffer count: 1132 2318336
2019-11-07 23:28:42.048219-0800 latencytest[3770:50445] Sleeping 7
2019-11-07 23:28:42.048375-0800 latencytest[3770:50445] Buffer count: 1320 2703360
2019-11-07 23:28:43.053613-0800 latencytest[3770:50445] Sleeping 8
2019-11-07 23:28:43.053771-0800 latencytest[3770:50445] Buffer count: 1508 3088384
2019-11-07 23:28:44.058961-0800 latencytest[3770:50445] Sleeping 9
2019-11-07 23:28:44.059119-0800 latencytest[3770:50445] Buffer count: 1696 3473408

Actual code:

import UIKit

import os.log

import Foundation
import AudioToolbox
import AVFoundation

class AuxiliaryWork: Thread {
    let II_SAMPLE_RATE = 48000

    var iiStopRequested: Int32 = 0;  // Int32 is normally guaranteed to be atomic on most architectures

    var iiBufferFillCount: Int32 = 0;
    var iiBufferByteCount: Int32 = 0;

    func requestStop() {
        iiStopRequested = 1;
    }

    func myAVAudioSessionInterruptionNotificationHandler(notification: Notification ) -> Void {
        os_log(OSLogType.info, "AVAudioSession Interrupted: %s", notification.debugDescription)
    }

    func myAudioUnitProvider(actionFlags: UnsafeMutablePointer<AudioUnitRenderActionFlags>, timestamp: UnsafePointer<AudioTimeStamp>,
                             frameCount: AUAudioFrameCount, inputBusNumber: Int, inputData: UnsafeMutablePointer<AudioBufferList>) -> AUAudioUnitStatus {
        let ppInputData = UnsafeMutableAudioBufferListPointer(inputData)
        let iiNumBuffers = ppInputData.count

        if (iiNumBuffers > 0) {
            assert(iiNumBuffers == 2)

            for bbBuffer in ppInputData {
                assert(Int(bbBuffer.mDataByteSize) == 2048)  // FIXME: This should be 960 or 1920 ...

                iiBufferFillCount += 1
                iiBufferByteCount += Int32(bbBuffer.mDataByteSize)

                memset(bbBuffer.mData, 0, Int(bbBuffer.mDataByteSize))  // Just send silence

            }
        } else {
            os_log(OSLogType.error, "Zero buffers from system")
            assert(iiNumBuffers != 0)  // Force crash since os_log would cause an audio hiccup due to locks anyway
        }

        return noErr
    }

    override func main() {
        os_log(OSLogType.info, "Thread main started")

#if os(iOS)
        let kOutputUnitSubType = kAudioUnitSubType_RemoteIO
#else
        let kOutputUnitSubType = kAudioUnitSubtype_HALOutput
#endif

        let audioSession = AVAudioSession.sharedInstance()  // FIXME: Causes the following message No Factory registered for id
        try! audioSession.setCategory(AVAudioSession.Category.playback, options: [])
        try! audioSession.setMode(AVAudioSession.Mode.measurement)

        try! audioSession.setPreferredSampleRate(48000.0)
        try! audioSession.setPreferredIOBufferDuration(0.010)

        NotificationCenter.default.addObserver(
            forName: AVAudioSession.interruptionNotification,
            object: nil,
            queue: nil,
            using: myAVAudioSessionInterruptionNotificationHandler
        )

        let ioUnitDesc = AudioComponentDescription(
            componentType: kAudioUnitType_Output,
            componentSubType: kOutputUnitSubType,
            componentManufacturer: kAudioUnitManufacturer_Apple,
            componentFlags: 0,
            componentFlagsMask: 0)

        let auUnit = try! AUAudioUnit(componentDescription: ioUnitDesc,
                                      options: AudioComponentInstantiationOptions())

        auUnit.outputProvider = myAudioUnitProvider;
        auUnit.maximumFramesToRender = 256


        try! audioSession.setActive(true)

        try! auUnit.allocateRenderResources()  // Make sure audio unit has hardware resources--we could provide the buffers from the circular buffer if we want
        try! auUnit.startHardware()


        os_log(OSLogType.info, "Sample rate: %d", audioSession.sampleRate);
        os_log(OSLogType.info, "Buffer duration: %f", audioSession.ioBufferDuration);

        os_log(OSLogType.info, "Number of output busses: %d", auUnit.outputBusses.count);
        os_log(OSLogType.info, "Max frames: %d", auUnit.maximumFramesToRender);


        os_log(OSLogType.info, "Can perform output: %d", auUnit.canPerformOutput)
        os_log(OSLogType.info, "Output Enabled: %d", auUnit.isOutputEnabled)
        //os_log(OSLogType.info, "Audio Format: %p", audioFormat)

        var bus0 = auUnit.outputBusses[0]
        os_log(OSLogType.info, "Bus channels: %d", bus0.format.channelCount)
        os_log(OSLogType.info, "Bus format: %d", bus0.format.commonFormat.rawValue)
        os_log(OSLogType.info, "Bus rate: %d", bus0.format.sampleRate)

        for ii in 0..<10 {
            if (iiStopRequested != 0) {
                os_log(OSLogType.info, "Manual stop requested");
                break;
            }

            os_log(OSLogType.info, "Sleeping %d", ii);
            os_log(OSLogType.info, "Buffer count: %d %d", iiBufferFillCount, iiBufferByteCount)
            Thread.sleep(forTimeInterval: 1.0);
        }

        auUnit.stopHardware()
    }
}

class FirstViewController: UIViewController {
    var thrAuxiliaryWork: AuxiliaryWork? = nil;

    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view.
    }

    @IBAction func startButtonPressed(_ sender: Any) {
        os_log(OSLogType.error, "Start button pressed");
        os_log(OSLogType.error, "Launching auxiliary thread");

        thrAuxiliaryWork = AuxiliaryWork();
        thrAuxiliaryWork?.start();
    }

    @IBAction func stopButtonPressed(_ sender: Any) {
        os_log(OSLogType.error, "Stop button pressed");
        os_log(OSLogType.error, "Manually stopping auxiliary thread");
        thrAuxiliaryWork?.requestStop();
    }

    @IBAction func muteButtonPressed(_ sender: Any) {
        os_log(OSLogType.error, "Mute button pressed");
    }

    @IBAction func unmuteButtonPressed(_ sender: Any) {
        os_log(OSLogType.error, "Unmute button pressed");
    }
}

Solution

  • You cannot beat iOS silicon hardware into submission by assuming the API will do it for you. You have to do your own buffering if you want to abstract the hardware.

    For the very best (lowest) latencies, your software will have to (potentially dynamically) adapt to the actual hardware capabilities, which can vary from device to device, and mode to mode.

    The hardware sample rate appears to be either 44.1ksps (older iOS devices), 48ksps (newer arm64 iOS devices), or an integer multiple thereof (and potentially other rates when plugging in non-AirPod Bluetooth headsets, or external ADCs). The actual hardware DMA (or equivalent) buffers seem to always be a power of 2 in size, potentially down to 64 samples on newest devices. However various iOS power saving modes will increase the buffer size (by powers of 2) up to 4k samples, especially on older iOS devices. If you request a sample rate other than the hardware rate, the OS might resample the buffers to a different size than a power of 2, and this size can change from Audio Unit callback to subsequent callback if the resampling ratio isn't an exact integer.

    Audio Units are the lowest level accessible via public API on iOS devices. Everything else is built on top, and thus potentially incurs greater latencies. For instance, if you use the Audio Queue API with non-hardware buffer sizes, the OS will internally use power-of-2 audio buffers to access the hardware, and chop them up or fractionally concatenate them to return or fetch Audio Queue buffers of non-hardware sizes. Slower and jittery.

    Far from being half-baked, for a long time the iOS API was the only API usable on mobile phones and tablets for live low-latency music performance. But by developing software matched to the hardware.