I've got a moderately complicated AVAssetWriterInput setup that I'm using to be able to flip the camera while I'm recording. Basically run two sessions, when the user taps to flip the camera I disconnect session 1 from the output and attach session 2.
This works really great. I can export the video and it plays just fine.
Now that I'm trying to do more advanced stuff with the resulting video some problems are popping up, specifically the AVAssetTracks on the inside of the exported AVAsset are slightly mismatched (always by less than 1 frame). Specifically I'm trying to do this: https://www.raywenderlich.com/6236502-avfoundation-tutorial-adding-overlays-and-animations-to-videos but a significant amount of the time there ends up being an all black frame, sometimes at the head of the video, sometimes at the tail of the video, that appears for a split second. The time varies, but it's always less than a frame (see logs below, 1/30 or 0.033333333s)
I did a bit of back-and-forth debugging and I managed to record a video using my recorder that consistently produced a trailing black frame, BUT using the tutorial code I have not been able to create a video that produces a trailing black frame. I added some similar logging (to what's pasted below) to the tutorial code and I'm seeing deltas of no greater than 2/100ths of a second. So around 1/10th of 1 frame at most. It's even a perfect 0 on one occasion.
So my sense right now is that what's happening is I record my video, both assetInputs start to gobble data, and then when I say "stop" they stop. The video input stops with the last complete frame, and the audio input does similarly. But since the audio input is sampling at a much higher rate than the video they're not synced up perfectly and I end up with more audio than video. This isn't a problem until I compose an asset with the two tracks and then the composition engine thinks I mean "yes, actually use 100% of all the time for all tracks even if there is a mismatch" which results in the black screen.
(Edit: This is basically what's happening - https://blender.stackexchange.com/questions/6268/audio-track-and-video-track-are-not-the-same-length)
I think the correct solution is, instead of worrying about the composition construction and timing and making sure it's all right, just make the captured audio and video match up as nicely as possible. Ideally 0, but I'd be fine with anything around 1/10th of a frame.
So my question is: How do I make two AVAssetWriterInputs, one audio and one video, attached to a AVAssetWriter line up better? Is there a setting somewhere? Do I mess with the framerates? Should I just trim the exported asset to the length of the video track? Can I duplicate the last captured frame when I stop recording? Can I have it so that the inputs stop at different times - basically have the audio stop first and then wait for the video to 'catch up' and then stop the video? Something else? I'm at a loss for ideas here :|
MY LOGGING
BUFFER | VIdeo SETTINGS: Optional(["AVVideoCompressionPropertiesKey": {
AllowFrameReordering = 1;
AllowOpenGOP = 1;
AverageBitRate = 7651584;
**ExpectedFrameRate = 30;**
MaxKeyFrameIntervalDuration = 1;
MaxQuantizationParameter = 41;
MinimizeMemoryUsage = 1;
Priority = 80;
ProfileLevel = "HEVC_Main_AutoLevel";
RealTime = 1;
RelaxAverageBitRateTarget = 1;
SoftMinQuantizationParameter = 18;
}, "AVVideoCodecKey": hvc1, "AVVideoWidthKey": 1080, "AVVideoHeightKey": 1920])
BUFFER | AUDIO SETTINGS Optional(["AVNumberOfChannelsKey": 1, "AVFormatIDKey": 1633772320, **"AVSampleRateKey": 48000**])
BUFFER | asset duration: 0.5333333333333333
BUFFER | video track duration: 0.5066666666666667
BUFFER | Audio track duration: 0.5333333333333333
**BUFFER | Asset Delta: -0.026666666666666616**
BUFFER | asset duration: 0.384
BUFFER | video track duration: 0.37333333333333335
BUFFER | Audio track duration: 0.384
**BUFFER | Asset Delta: -0.010666666666666658**
BUFFER | asset duration: 0.9405416666666667
BUFFER | video track duration: 0.935
BUFFER | Audio track duration: 0.9405416666666667
**BUFFER | Asset Delta: -0.005541666666666667**
TUTORIAL LOGGING
COMPOSE | asset duration: 0.7333333333333333
COMPOSE | video track duration: 0.7333333333333333
COMPOSE | audio track duration: 0.7316666666666667
**Delta: ~0.01667**
COMPOSE | asset duration: 1.3333333333333333
COMPOSE | video track duration: 1.3333333333333333
COMPOSE | audio track duration: 1.3316666666666668
**Delta: ~0.01667**
COMPOSE | asset duration: 1.0316666666666667
COMPOSE | video track duration: 1.0316666666666667
COMPOSE | audio track duration: 1.0316666666666667
**Delta: 0 (wow)**
TL;DR - don't just AVAssetWriter.finishWriting {}
because then the last written frame is T_End. Instead, use AVAssetWriter.endSession(atSourceTime:)
to set T_End to be the time of the last written video frame.
AVCaptureVideoDataOutputSampleBufferDelegate TO THE RESCUE!!
Use AVCapture(Video|Audio)DataOutputSampleBufferDelegate to write buffers to the AVAssetWriter (attach delegates to AVCaptureVideoDataOutput and AVCaptureAudioDataOutput)
Once the session is started and your outputs are going they're going to constantly be spitting out data onto this delegate
main
queue) will not have collisions or memory issues when reading the lastVideoFrameWrite
Now for the fun part!
RESULTS
BUFFER | asset duration: 1.8683333333333334
BUFFER | video track duration: 1.8683333333333334
BUFFER | Audio track duration: 1.868
BUFFER | Asset Delta: 0.0003333333333332966
BUFFER | asset duration: 1.435
BUFFER | video track duration: 1.435
BUFFER | Audio track duration: 1.4343333333333332
BUFFER | Asset Delta: 0.0006666666666668153
BUFFER | asset duration: 1.8683333333333334
BUFFER | video track duration: 1.8683333333333334
BUFFER | Audio track duration: 1.8682291666666666
BUFFER | Asset Delta: 0.00010416666666679397
BUFFER | asset duration: 1.435
BUFFER | video track duration: 1.435
BUFFER | Audio track duration: 1.4343541666666666
BUFFER | Asset Delta: 0.0006458333333334565
LOOK AT THOSE DELTAS!!!!! all sub millisecond. Very nice.
CODE
To Record
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard CMSampleBufferDataIsReady(sampleBuffer) else {
return
}
if output == audioDataOutput {
// PROCESS AUDIO BUFFER
}
if output == videoDataOutput {
// PROCESS VIDEO BUFFER
}
// 1
let writable = canWrite
let time = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
if writable && sessionAtSourceTime == nil {
// 2
if output == videoDataOutput {
sessionAtSourceTime = time
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
} else {
return
}
}
if output == videoDataOutput && writable {
if videoWriterInput != nil {
if videoWriterInput.isReadyForMoreMediaData {
//Write video buffer
videoWriterInput.append(sampleBuffer)
// 3
WBufferCameraSessionController.finishRecordQueue.async {
self.lastVideoFrameWrite = time
}
}
}
} else if writable,
output == audioDataOutput,
audioWriterInput != nil,
audioWriterInput.isReadyForMoreMediaData {
//Write audio buffer
audioWriterInput.append(sampleBuffer)
}
if output == videoDataOutput {
bufferDelegate?.didOuputVideoBuffer(buffer: sampleBuffer)
}
}
Stop Recording
func stopRecording() {
guard isRecording else {
return
}
guard isStoppingRecording == false else {
return
}
isStoppingRecording = true
WBufferCameraSessionController.finishRecordQueue.async {
// 4
if self.lastVideoFrameWrite != nil {
self.videoWriter.endSession(atSourceTime: self.lastVideoFrameWrite)
}
self.videoWriter.finishWriting {
// cleanup, do stuff with finished file if writing was successful
...
}
...
}
}