I'm using AVAssetWriter to write audio CMSampleBuffer to an mp4 file, but when I later read that file using AVAssetReader, it seems to be missing the initial chunk of data.
Here's the debug description of the first CMSampleBuffer passed to writer input append method (notice the priming duration attachement of 1024/44_100):
CMSampleBuffer 0x102ea5b60 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
TrimDurationAtStart = {
epoch = 0;
flags = 1;
timescale = 44100;
value = 1024;
}
formatDescription = <CMAudioFormatDescription 0x281fd9720 [0x1c061f840]> {
mediaType:'soun'
mediaSubType:'aac '
mediaSpecific: {
ASBD: {
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x2
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0 }
cookie: {<CFData 0x2805f50a0 [0x1c061f840]>{length = 39, capacity = 39, bytes = 0x03808080220000000480808014401400 ... 1210068080800102}}
ACL: {(null)}
FormatList Array: {
Index: 0
ChannelLayoutTag: 0x650002
ASBD: {
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0 }}
}
extensions: {(null)}
}
sbufToTrackReadiness = 0x0
numSamples = 1
outputPTS = {6683542167/44100 = 151554.244, rounded}(based on cachedOutputPresentationTimeStamp)
sampleTimingArray[1] = {
{PTS = {6683541143/44100 = 151554.221, rounded}, DTS = {6683541143/44100 = 151554.221, rounded}, duration = {1024/44100 = 0.023}},
}
sampleSizeArray[1] = {
sampleSize = 163,
}
dataBuffer = 0x281cc7a80
Here's the debug description of the second CMSampleBuffer (notice the priming duration attachement of 1088/44_100, which combined with the previous trim duration yields the standard value of 2112):
CMSampleBuffer 0x102e584f0 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
TrimDurationAtStart = {
epoch = 0;
flags = 1;
timescale = 44100;
value = 1088;
}
formatDescription = <CMAudioFormatDescription 0x281fd9720 [0x1c061f840]> {
mediaType:'soun'
mediaSubType:'aac '
mediaSpecific: {
ASBD: {
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x2
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0 }
cookie: {<CFData 0x2805f50a0 [0x1c061f840]>{length = 39, capacity = 39, bytes = 0x03808080220000000480808014401400 ... 1210068080800102}}
ACL: {(null)}
FormatList Array: {
Index: 0
ChannelLayoutTag: 0x650002
ASBD: {
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0 }}
}
extensions: {(null)}
}
sbufToTrackReadiness = 0x0
numSamples = 1
outputPTS = {6683543255/44100 = 151554.269, rounded}(based on cachedOutputPresentationTimeStamp)
sampleTimingArray[1] = {
{PTS = {6683542167/44100 = 151554.244, rounded}, DTS = {6683542167/44100 = 151554.244, rounded}, duration = {1024/44100 = 0.023}},
}
sampleSizeArray[1] = {
sampleSize = 179,
}
dataBuffer = 0x281cc4750
Now, when I read the audio track using AVAssetReader, the first CMSampleBuffer I get is:
CMSampleBuffer 0x102ed7b20 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
EmptyMedia(P) = true
formatDescription = (null)
sbufToTrackReadiness = 0x0
numSamples = 0
outputPTS = {0/1 = 0.000}(based on outputPresentationTimeStamp)
sampleTimingArray[1] = {
{PTS = {0/1 = 0.000}, DTS = {INVALID}, duration = {0/1 = 0.000}},
}
dataBuffer = 0x0
and the next one is contains priming info of 1088/44_100:
CMSampleBuffer 0x10318bc00 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
FillDiscontinuitiesWithSilence(P) = true
GradualDecoderRefresh(P) = 1
TrimDurationAtStart(P) = {
epoch = 0;
flags = 1;
timescale = 44100;
value = 1088;
}
IsGradualDecoderRefreshAuthoritative(P) = false
formatDescription = <CMAudioFormatDescription 0x281fdcaa0 [0x1c061f840]> {
mediaType:'soun'
mediaSubType:'aac '
mediaSpecific: {
ASBD: {
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0 }
cookie: {<CFData 0x2805f3800 [0x1c061f840]>{length = 39, capacity = 39, bytes = 0x03808080220000000480808014401400 ... 1210068080800102}}
ACL: {Stereo (L R)}
FormatList Array: {
Index: 0
ChannelLayoutTag: 0x650002
ASBD: {
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0 }}
}
extensions: {{
VerbatimISOSampleEntry = {length = 87, bytes = 0x00000057 6d703461 00000000 00000001 ... 12100680 80800102 };
}}
}
sbufToTrackReadiness = 0x0
numSamples = 43
outputPTS = {83/600 = 0.138}(based on outputPresentationTimeStamp)
sampleTimingArray[1] = {
{PTS = {1024/44100 = 0.023}, DTS = {1024/44100 = 0.023}, duration = {1024/44100 = 0.023}},
}
sampleSizeArray[43] = {
sampleSize = 179,
sampleSize = 173,
sampleSize = 178,
sampleSize = 172,
sampleSize = 172,
sampleSize = 159,
sampleSize = 180,
sampleSize = 200,
sampleSize = 187,
sampleSize = 189,
sampleSize = 206,
sampleSize = 192,
sampleSize = 195,
sampleSize = 186,
sampleSize = 183,
sampleSize = 189,
sampleSize = 211,
sampleSize = 198,
sampleSize = 204,
sampleSize = 211,
sampleSize = 204,
sampleSize = 202,
sampleSize = 218,
sampleSize = 210,
sampleSize = 206,
sampleSize = 207,
sampleSize = 221,
sampleSize = 219,
sampleSize = 236,
sampleSize = 219,
sampleSize = 227,
sampleSize = 225,
sampleSize = 225,
sampleSize = 229,
sampleSize = 225,
sampleSize = 236,
sampleSize = 233,
sampleSize = 231,
sampleSize = 249,
sampleSize = 234,
sampleSize = 250,
sampleSize = 249,
sampleSize = 259,
}
dataBuffer = 0x281cde370
The input append method keeps returning true
which in principle means that all sample buffers got appended, but the reader for some reason skips the first chunk of data. Is there anything I'm doing wrong here?
I'm using the following code to read the file:
let asset = AVAsset(url: fileURL)
guard let assetReader = try? AVAssetReader(asset: asset) else {
return
}
asset.loadValuesAsynchronously(forKeys: ["tracks"]) { in
guard let audioTrack = asset.tracks(withMediaType: .audio).first else { return }
let audioOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: nil)
assetReader.startReading()
while assetReader.status == .reading {
if let sampleBuffer = audioOutput.copyNextSampleBuffer() {
// do something
}
}
}
First some pedantry: you haven't lost your first sample buffer, but rather the first packet within your first sample buffer.
The behaviour of AVAssetReader
with nil
outputSettings
when reading AACÂ packet data has changed on iOS 13 and macOS 10.15 (Catalina).
Previously you would get the first AAC packet, that packet's presentation timestamp (zero) and a trim attachment instructing you to discard the usual first 2112 frames of decoded audio.
Now [iOS 13, macOS 10.15] AVAssetReader
seems to discard the first packet, leaving you the second packet, whose presentation timestamp is 1024, and you need only discard 2112 - 1024 = 1088
of the decoded frames.
Something that might not be immediately obvious in the above situations is that AVAssetReader
is talking about TWO timelines, not one. The packet timestamps are referred to one, the untrimmed timeline, and the trim instruction implies the existence of another: the untrimmed timeline.
The transformation from untrimmed to trimmed timestamps is very simple, it's usually trimmed = untrimmed - 2112
.
So is the new behaviour a bug? The fact that if you decode to LPCM and correctly follow the trim instructions, then you should still get the same audio, leads me believe the change was intentional (NB: I haven't yet personally confirmed the LPCM samples are the same).
However, the documentation says:
A value of nil for outputSettings configures the output to vend samples in their original format as stored by the specified track.
I don't think you can both discard packets [even the first one, which is basically a constant] and claim to be vending samples in their "original format", so from this point of view I think the change has a bug-like quality.
I also think it's an unfortunate change as I used to consider nil
outputSettings
AVAssetReader
to be a sort of "raw" mode, but now it assumes your only use case is decoding to LPCM.
There's only one thing that could downgrade "unfortunate" to "serious bug", and that's if this new "let's pretend the first AAC packet doesn't exist" approach extends to files created with AVAssetWriter
because that would break interoperability with non-AVAssetReader
code, where the number of frames to trim has congealed to a constant 2112 frames. I also haven't personally confirmed this. Do you have a file created with the above sample buffers that you can share?
p.s. I don't think your input sample buffers are relevant here, I think you'd lose the first packet reading from any AAC file. However your input sample buffers seem slightly unusual in that they have hosttime [capture session?] style timestamps, yet are AAC, and only have one packet per sample buffer, which isn't very many and seems like a lot of overhead for 23ms of audio. Are you creating them them yourself in an AVCaptureSession
-> AVAudioConverter
chain?