I am looking to merge several videos together (all from different sources) in Swift with AVFoundation. The resulting video should be in portrait format.
The function I wrote merge videos together into one video. However, videos taken from a mobile phone (such as an iPhone) seem to be exported in landscape while the rest is in portrait. The landscaped video would then be stretched upwards to fit the portrait aspect ratio. It seems that iPhone saves the video as landscape (even if it is in portrait), then the system uses the metadata to display it as portrait.
To combat this, I attempted to detect if a video is landscape (or in another rotation), and then manually transform it to portrait. However, when I do this, it seems like the transformation is applied to the entire track, which results in the entire composition rendering in landscape with some of the videos rendering in landscape and others in portrait. I can't figure out how to apply transformations to only a single video. I've tried using multiple tracks, but then only one video is shown and the rest of the tracks are ignored. Here is an example of the exported video (it's rendered like this, it should render as 9:16 but with the transformation it renders 16:9, notice the second clip is distorted although it is originally recorded in portrait).
Here's my code:
private static func mergeVideos(
videoPaths: [URL],
outputURL: URL,
handler: @escaping (_ path: URL)-> Void
) {
let videoComposition = AVMutableComposition()
var lastTime: CMTime = .zero
guard let videoCompositionTrack = videoComposition.addMutableTrack(withMediaType: .video, preferredTrackID: Int32(kCMPersistentTrackID_Invalid)) else { return }
for path in videoPaths {
let assetVideo = AVAsset(url: path)
getTracks(assetVideo, .video) { videoTracks in
// Add video track
do {
try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: assetVideo.duration), of: videoTracks[0], at: lastTime)
// Apply the original transform
if let assetVideoTrack = assetVideo.tracks(withMediaType: AVMediaType.video).last {
let t = assetVideoTrack.preferredTransform
let size = assetVideoTrack.naturalSize
let videoAssetOrientation: CGImagePropertyOrientation
if size.width == t.tx && size.height == t.ty {
videoAssetOrientation = .down
videoCompositionTrack.preferredTransform = CGAffineTransform(rotationAngle: .pi) // 180 degrees
} else if t.tx == 0 && t.ty == 0 {
videoCompositionTrack.preferredTransform = assetVideoTrack.preferredTransform
videoAssetOrientation = .up
} else if t.tx == 0 && t.ty == size.width {
videoAssetOrientation = .left
videoCompositionTrack.preferredTransform = CGAffineTransform(rotationAngle: .pi / 2) // 90 degrees to the right
} else {
videoAssetOrientation = .right
videoCompositionTrack.preferredTransform = CGAffineTransform(rotationAngle: -.pi / 2) // 90 degrees to the left
} catch {
print("Failed to insert video track")
self.getTracks(assetVideo, .audio) { audioTracks in
// Add audio track only if it exists
if !audioTracks.isEmpty {
do {
try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: assetVideo.duration), of: audioTracks[0], at: lastTime)
} catch {
print("Failed to insert audio track")
// Update time
lastTime = CMTimeAdd(lastTime, assetVideo.duration)
guard let exporter = AVAssetExportSession(asset: videoComposition, presetName: AVAssetExportPresetHighestQuality) else { return }
exporter.outputURL = outputURL
exporter.outputFileType = AVFileType.mp4
exporter.shouldOptimizeForNetworkUse = true
exporter.exportAsynchronously(completionHandler: {
switch exporter.status {
case .failed:
print("Export failed \(exporter.error!)")
case .completed:
print("completed export")
Anyone know what I am missing here? Any help is greatly appreciated.
Transform to videoCompositionTrack influence the whole track. You can use AVVideoComposition to do it, it use AVVideoCompositionInstruction to do video processing by timing.
Here is the code without unimportant parts, and rename the videoComposition to mainCompositon to avoid confusion:
private static func mergeVideos(
videoPaths: [URL],
outputURL: URL,
handler: @escaping (_ path: URL)-> Void
) {
let mainComposition = AVMutableComposition()
var lastTime: CMTime = .zero
guard let videoCompositionTrack = mainComposition.addMutableTrack(withMediaType: .video, preferredTrackID: Int32(kCMPersistentTrackID_Invalid)) else { return }
let layerInstruction = AVMutableVideoCompositionLayerInstruction(assetTrack: videoCompositionTrack)
for path in videoPaths {
let assetVideo = AVAsset(url: path)
getTracks(assetVideo, .video) { videoTracks in
// Add video track
do {
try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: assetVideo.duration), of: videoTracks[0], at: lastTime)
// Apply the original transform
if let assetVideoTrack = assetVideo.tracks(withMediaType: AVMediaType.video).last {
let t = assetVideoTrack.preferredTransform
layerInstruction.setTransform(t, at: lastTime) // apply transfrom to track at time.
} catch {
print("Failed to insert video track")
// deal with audio part ...
let videoCompostion = AVMutableVideoComposition()
let instruction = AVMutableVideoCompositionInstruction()
instruction.timeRange = CMTimeRange(start: .zero, end: lastTime)
videoCompostion.instructions = [instruction]
guard let exporter = AVAssetExportSession(asset: mainComposition, presetName: AVAssetExportPresetHighestQuality) else { return }
// assign videoComposition to exporter
exporter.videoComposition = videoCompostion
// other export part ...
PS. You'd better to add the getTracks(_:, _:)
method to complete the code.