macoscocoaaudioavfoundationavassetreader

CMSampleBufferRef and AVAssetReaderMixAudioOutput and PCM problems


So I am using a AVAssetReaderMixAudioOutput to extract audio samples from a quicktime file. In this case it is a ProRes video with multiple tracks of audio.

(4 track, 16bit, interleaved samples littleEndian @ 48000)

I can get the video frames ok, but when i call [myAssetReaderAudioMixOutput copyNextSampleBuffer] I run into some odd problems.... It appears that the Audio that is returned is all in the first Channel?

Using the individual trackOutputReader's i get the first audio samples for each track for the first frame are:

620B 700E 0000 0000

But when i use AVAssetReaderMixAudioOutput i get

D219 0000 0000 0000

(notice that 620B + 700E = D219) so it looks like the AVAssetReaderMixAudioOutput is summing all the values across the 4 channels and giving me the result in track 1??

Can anyone explain why? and how to fix it? I need a solution that will give me a 1:1 mapping of the channels as they are in the quicktime file, ie. it needs to work for files with both 1 channel and also 16 channel audio.

I got the correct values for the first sample by doing a copyNextSampleBuffer on each audio channel/tack by itself

This is the dictionary i used to create the myAssetReaderAudioMixOutput....

NSDictionary *outputSettings =
[NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithFloat:48000], AVSampleRateKey,
[NSNumber numberWithInt:4], AVNumberOfChannelsKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSNumber numberWithBool:NO], AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
nil];

myAssetReaderAudioMixOutput = [AVAssetReaderAudioMixOutput assetReaderAudioMixOutputWithAudioTracks:audioTracks audioSettings: outputSettings];

I am using the following bit of code to read the actual audio samples/data from the CMSampleBuffer..

enteraudioBuffer = [assetReaderAudioMixOutput copyNextSampleBuffer]; if (audioBuffer) { CMBlockBufferRef audioBlockBuffer = CMSampleBufferGetDataBuffer(audioBuffer);

// lets get some more info about our SampleBuffer, or at least sample size for sample 0!
CMTime sampleDuration = CMSampleBufferGetDuration(audioBuffer);
size_t sampleSize =  CMSampleBufferGetSampleSize(audioBuffer, 0);
CMItemCount numSamplesInBuffer = CMSampleBufferGetNumSamples(audioBuffer);

bfAudioBuffer* pbfBuffer = new bfAudioBuffer();
int samplesNeededForThisFrame = 1920;           // sample for FrameNo(frameNo, vidMode);
int sizeOfDataToBeCopied = samplesNeededForThisFrame * sampleSize
// Audio Samples for 1 frames worth of audio should be copied into pbfBuffer->pPcmBuffer
CMBlockBufferCopyDataBytes(audioBlockBuffer, 0, sizeOfDataToBeCopied, pbfBuffer->pPcmBuffer);

}

( Sorry it seems to be mangling the code as i paste it in, no idea why, i did try a few different things - sorry)

So i think that my problem is either in setting up the dictionary, or in reading the samples. I use the same system to read the samples for a single track, so i am doubt that is it? I just cannot understand why it is giving me the correct amount of data/samples for 4 tracks, but then only putting information in the first track??

Lastly i am on OSX, don't care about iOS.

Thanks for any help, this has been VERY frustrating!


Solution

  • Right i finally found an answer to this issue soi i thought i would update My Q. with the solution.

    So The problem was in my understanding of what AVAssetReaderMixAudioOutput actually does.

    I thought i was able to give me a mix of multiple audio tracks, but it is actually MEANT to mix the tracks in a user specified manner and then it return a sinlge track of audio. ( keep in mind a "track" here could be a single track of stereo sound)

    IN order to get multi-track sound out of the file i need to have a AVAssetReader for every track i want to extract.

    Hope someone finds this helpful