swift macos audio avfoundation avaudioengine

AVAudioPlayerNode causing distortion

I have an AVAudioPlayerNode attached to an AVAudioEngine. Sample buffers are provided to playerNode via the scheduleBuffer() method.

However, it seems like playerNode is distorting the audio. Rather than simply "passing through" the buffers, the output is distorted and contains static (but is still mostly audible).

Relevant code:

let myBufferFormat = AVAudioFormat(standardFormatWithSampleRate: 48000, channels: 2)

// Configure player node
let playerNode = AVAudioPlayerNode()
audioEngine.attach(playerNode)
audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: myBufferFormat)

// Provide audio buffers to playerNode
for await buffer in mySource.streamAudio() {
    await playerNode.scheduleBuffer(buffer)
}

In the example above, mySource.streamAudio() is providing audio in realtime from a ScreenCaptureKit SCStreamDelegate. The audio buffers arrive as CMSampleBuffer, are converted to AVAudioPCMBuffer, then passed along via AsyncStream to the audio engine above. I've verified that the converted buffers are valid.

Maybe the buffers don't arrive fast enough? This graph of ~25,000 frames suggests that inputNode is inserting segments of "zero" frames periodically:

The distortion seems to be a result of these empty frames.

Edit:

Even if we remove AsyncStream from the pipeline, and handle the buffers immediately within the ScreenCaptureKit callback, the distortion persists. Here's an end-to-end example that can be run as-is (the important part is didOutputSampleBuffer):

class Recorder: NSObject, SCStreamOutput {
    
    private let audioEngine = AVAudioEngine()
    private let playerNode = AVAudioPlayerNode()
    private var stream: SCStream?
    private let queue = DispatchQueue(label: "sampleQueue", qos: .userInitiated)
    
    func setupEngine() {
        let format = AVAudioFormat(standardFormatWithSampleRate: 48000, channels: 2)
        audioEngine.attach(playerNode)
        // playerNode --> mainMixerNode --> outputNode --> speakers
        audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: format)
        audioEngine.prepare()
        try? audioEngine.start()
        playerNode.play()
    }
    
    func startCapture() async {
        // Capture audio from Safari
        let availableContent = try! await SCShareableContent.excludingDesktopWindows(true, onScreenWindowsOnly: false)
        let display = availableContent.displays.first!
        let app = availableContent.applications.first(where: {$0.applicationName == "Safari"})!
        let filter = SCContentFilter(display: display, including: [app], exceptingWindows: [])
        let config = SCStreamConfiguration()
        config.capturesAudio = true
        config.sampleRate = 48000
        config.channelCount = 2
        stream = SCStream(filter: filter, configuration: config, delegate: nil)
        try! stream!.addStreamOutput(self, type: .audio, sampleHandlerQueue: queue)
        try! stream!.addStreamOutput(self, type: .screen, sampleHandlerQueue: queue) // To prevent warnings
        try! await stream!.startCapture()
    }
    
    func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer, of type: SCStreamOutputType) {
        switch type {
        case .audio:
            let pcmBuffer = createPCMBuffer(from: sampleBuffer)!
            playerNode.scheduleBuffer(pcmBuffer, completionHandler: nil)
        default:
            break // Ignore video frames
        }
    }
    
    func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
        var ablPointer: UnsafePointer<AudioBufferList>?
        try? sampleBuffer.withAudioBufferList { audioBufferList, blockBuffer in
            ablPointer = audioBufferList.unsafePointer
        }
        guard let audioBufferList = ablPointer,
              let absd = sampleBuffer.formatDescription?.audioStreamBasicDescription,
              let format = AVAudioFormat(standardFormatWithSampleRate: absd.mSampleRate, channels: absd.mChannelsPerFrame) else { return nil }
        return AVAudioPCMBuffer(pcmFormat: format, bufferListNoCopy: audioBufferList)
    }
    
}

let recorder = Recorder()
recorder.setupEngine()
Task {
    await recorder.startCapture()
}

Solution

The culprit was the createPCMBuffer() function. Replace it with this and everything runs smoothly:

func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
    let numSamples = AVAudioFrameCount(sampleBuffer.numSamples)
    let format = AVAudioFormat(cmAudioFormatDescription: sampleBuffer.formatDescription!)
    let pcmBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: numSamples)!
    pcmBuffer.frameLength = numSamples
    CMSampleBufferCopyPCMDataIntoAudioBufferList(sampleBuffer, at: 0, frameCount: Int32(numSamples), into: pcmBuffer.mutableAudioBufferList)
    return pcmBuffer
}

The original function in my question was taken directly from Apple's ScreenCaptureKit example project. It technically works, and the audio sounds fine when written to file, but apparently it's not fast enough for realtime audio.

Edit: Actually it's probably not about speed, as the new function is 2-3x slower on average due to copying data. It may have been that the underlying data was getting released when creating the AVAudioPCMBuffer with a pointer.