
How to convert 2 mono files into a single stereo file in iOS?

I'm trying to convert 2 CAF files locally into a single file. These 2 CAF files are mono streams, and ideally, I'd like for them to be a stereo file so that way I can have the mic from one channel and the speaker from another.

I originally started by using AVAssetTrack and AVMutableCompositionTracks, however I couldn't resolve the mixing. My merged file was a single mono stream that interleaved the two files. So I've opted to go the AVAudioEngine route.

From my understanding, I can pass in my two files as input nodes, attach them to a mixer, and have an output node that is able to obtain the stereo mix. The output file has a stereo layout however no audio data seems to be written to it as I can open it in Audacity and see the stereo layout. Placing a dipatch sephamore signal around the installTapOnBus call did not help much either. Any insight would be appreciated as CoreAudio has been a challenge to understand.

// obtain path of microphone and speaker files
NSString *micPath = [[NSBundle mainBundle] pathForResource:@"microphone" ofType:@"caf"];
NSString *spkPath = [[NSBundle mainBundle] pathForResource:@"speaker" ofType:@"caf"];
NSURL *micURL = [NSURL fileURLWithPath:micPath];
NSURL *spkURL = [NSURL fileURLWithPath:spkPath];

// create engine
AVAudioEngine *engine = [[AVAudioEngine alloc] init];

AVAudioFormat *stereoFormat = [[AVAudioFormat alloc] initStandardFormatWithSampleRate:16000 channels:2];

AVAudioMixerNode *mainMixer = engine.mainMixerNode;

// create audio files
AVAudioFile *audioFile1 = [[AVAudioFile alloc] initForReading:micURL error:nil];
AVAudioFile *audioFile2 = [[AVAudioFile alloc] initForReading:spkURL error:nil];

// create player input nodes
AVAudioPlayerNode *apNode1 = [[AVAudioPlayerNode alloc] init];
AVAudioPlayerNode *apNode2 = [[AVAudioPlayerNode alloc] init];

// attach nodes to the engine
[engine attachNode:apNode1];
[engine attachNode:apNode2];

// connect player nodes to engine's main mixer
stereoFormat = [mainMixer outputFormatForBus:0];
[engine connect:apNode1 to:mainMixer fromBus:0 toBus:0 format:audioFile1.processingFormat];
[engine connect:apNode2 to:mainMixer fromBus:0 toBus:1 format:audioFile2.processingFormat];
[engine connect:mainMixer to:engine.outputNode format:stereoFormat];

// start the engine
NSError *error = nil;
if(![engine startAndReturnError:&error]){
    NSLog(@"Engine failed to start.");

// create output file
NSString *mergedAudioFile = [[micPath stringByDeletingLastPathComponent] stringByAppendingPathComponent:@"merged.caf"];
[[NSFileManager defaultManager] removeItemAtPath:mergedAudioFile error:&error];
NSURL *mergedURL = [NSURL fileURLWithPath:mergedAudioFile];
AVAudioFile *outputFile = [[AVAudioFile alloc] initForWriting:mergedURL settings:[engine.inputNode inputFormatForBus:0].settings error:&error];

// write from buffer to output file
[mainMixer installTapOnBus:0 bufferSize:4096 format:[mainMixer outputFormatForBus:0] block:^(AVAudioPCMBuffer *buffer, AVAudioTime *when){
    NSError *error;
    BOOL success;
    if((outputFile.length < audioFile1.length) || (outputFile.length < audioFile2.length)){
        success = [outputFile writeFromBuffer:buffer error:&error];
        NSCAssert(success, @"error writing buffer data to file, %@", [error localizedDescription]);
            NSLog(@"Error: %@", error);
        [mainMixer removeTapOnBus:0];
        NSLog(@"Done writing");



  • Doing this with ExtAudioFile involves three files, and three buffers. Two mono for reading, and one stereo for writing. In a loop, each mono file will read small a segment of audio to its mono output buffer, then copied into the correct "half" of the stereo buffer. Then with the stereo buffer full of data, write that buffer it to the output file, repeat until both mono files have finished reading (writing zeroes if one mono file is longer than the other).

    The most problematic area for me is getting the file formats right, core-audio wants very specific formats. Luckily, AVAudioFormat exists to simplify the creation of some common formats.

    Each audio file reader/writer has two formats, one that represents the format that the data is stored in (file_format), and one that dictates the format that comes in/out of the the reader/writer (client_format). There are format converters built in to the reader/writers in case the formats are different.

    Here's an example:

        //This is what format the readers will output
        AVAudioFormat *monoClienFormat = [[AVAudioFormat alloc]initWithCommonFormat:AVAudioPCMFormatInt16 sampleRate:44100.0 channels:1 interleaved:0];
        //This is the format the writer will take as input
        AVAudioFormat *stereoClientFormat = [[AVAudioFormat alloc]initWithCommonFormat:AVAudioPCMFormatInt16 sampleRate:44100 channels:2 interleaved:0];
        //This is the format that will be written to storage.  It must be interleaved.
        AVAudioFormat *stereoFileFormat = [[AVAudioFormat alloc]initWithCommonFormat:AVAudioPCMFormatInt16 sampleRate:44100 channels:2 interleaved:1];
        NSURL *leftURL = [NSBundle.mainBundle URLForResource:@"left" withExtension:@"wav"];
        NSURL *rightURL = [NSBundle.mainBundle URLForResource:@"right" withExtension:@"wav"];
        NSString *stereoPath = [documentsDir() stringByAppendingPathComponent:@"stereo.wav"];
        NSURL *stereoURL = [NSURL URLWithString:stereoPath];
        ExtAudioFileRef leftReader;
        ExtAudioFileRef rightReader;
        ExtAudioFileRef stereoWriter;
        OSStatus status = 0;
        //Create readers and writer
        status = ExtAudioFileOpenURL((__bridge CFURLRef)leftURL, &leftReader);
        if(status)printf("error %i",status);//All the ExtAudioFile functins return a non-zero status if there's an error, I'm only checking one to demonstrate, but you should be checking all the ExtAudioFile function returns.
        ExtAudioFileOpenURL((__bridge CFURLRef)rightURL, &rightReader);
        //Here the file format is set to stereo interleaved.
        ExtAudioFileCreateWithURL((__bridge CFURLRef)stereoURL, kAudioFileCAFType, stereoFileFormat.streamDescription, nil, kAudioFileFlags_EraseFile, &stereoWriter);
        //Set client format for readers and writer
        ExtAudioFileSetProperty(leftReader, kExtAudioFileProperty_ClientDataFormat, sizeof(AudioStreamBasicDescription), monoClienFormat.streamDescription);
        ExtAudioFileSetProperty(rightReader, kExtAudioFileProperty_ClientDataFormat, sizeof(AudioStreamBasicDescription), monoClienFormat.streamDescription);
        ExtAudioFileSetProperty(stereoWriter, kExtAudioFileProperty_ClientDataFormat, sizeof(AudioStreamBasicDescription), stereoClientFormat.streamDescription);
        int framesPerRead = 4096;
        int bufferSize = framesPerRead * sizeof(SInt16);
        //Allocate memory for the buffers
        AudioBufferList *leftBuffer = createBufferList(bufferSize,1);
        AudioBufferList *rightBuffer = createBufferList(bufferSize,1);
        AudioBufferList *stereoBuffer = createBufferList(bufferSize,2);
        //ExtAudioFileRead takes an ioNumberFrames argument.  On input the number of frames you want, on otput it's the number of frames you got.  0 means your done.
        UInt32 leftFramesIO = framesPerRead;
        UInt32 rightFramesIO = framesPerRead;
        while (leftFramesIO || rightFramesIO) {
            if (leftFramesIO){
                //If frames to read is less than a full buffer, zero out the remainder of the buffer
                int framesRemaining = framesPerRead - leftFramesIO;
                if (framesRemaining){
                    memset(((SInt16 *)leftBuffer->mBuffers[0].mData) + framesRemaining, 0, sizeof(SInt16) * framesRemaining);
                //Read into left buffer
                leftBuffer->mBuffers[0].mDataByteSize = leftFramesIO * sizeof(SInt16);
                ExtAudioFileRead(leftReader, &leftFramesIO, leftBuffer);
                //set to zero if no more frames to read
                memset(leftBuffer->mBuffers[0].mData, 0, sizeof(SInt16) * framesPerRead);
            if (rightFramesIO){
                int framesRemaining = framesPerRead - rightFramesIO;
                if (framesRemaining){
                    memset(((SInt16 *)rightBuffer->mBuffers[0].mData) + framesRemaining, 0, sizeof(SInt16) * framesRemaining);
                rightBuffer->mBuffers[0].mDataByteSize = rightFramesIO * sizeof(SInt16);
                ExtAudioFileRead(rightReader, &rightFramesIO, rightBuffer);
                memset(rightBuffer->mBuffers[0].mData, 0, sizeof(SInt16) * framesPerRead);
            UInt32 stereoFrames = MAX(leftFramesIO, rightFramesIO);
            //copy left to stereoLeft and right to stereoRight
            memcpy(stereoBuffer->mBuffers[0].mData, leftBuffer->mBuffers[0].mData, sizeof(SInt16) * stereoFrames);
            memcpy(stereoBuffer->mBuffers[1].mData, rightBuffer->mBuffers[0].mData, sizeof(SInt16) * stereoFrames);
            //write to file
            stereoBuffer->mBuffers[0].mDataByteSize = stereoFrames * sizeof(SInt16);
            stereoBuffer->mBuffers[1].mDataByteSize = stereoFrames * sizeof(SInt16);
            ExtAudioFileWrite(stereoWriter, stereoFrames, stereoBuffer);
    AudioBufferList *createBufferList(int bufferSize, int numberBuffers){
        assert(bufferSize > 0 && numberBuffers > 0);
        int bufferlistByteSize = sizeof(AudioBufferList);
        bufferlistByteSize += sizeof(AudioBuffer) * (numberBuffers - 1);
        AudioBufferList *bufferList = malloc(bufferlistByteSize);
        bufferList->mNumberBuffers = numberBuffers;
        for (int i = 0; i < numberBuffers; i++) {
            bufferList->mBuffers[i].mNumberChannels = 1;
            bufferList->mBuffers[i].mData = malloc(bufferSize);
        return bufferList;
    void freeBufferList(AudioBufferList *bufferList){
        for (int i = 0; i < bufferList->mNumberBuffers; i++) {
    NSString *documentsDir(){
        static NSString *path = NULL;
            path = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, 1).firstObject;
        return path;