iosswiftvideoavfoundation

How to set separate transformations for videos when merging in AVFoundation?


I am looking to merge several videos together (all from different sources) in Swift with AVFoundation. The resulting video should be in portrait format.

The function I wrote merge videos together into one video. However, videos taken from a mobile phone (such as an iPhone) seem to be exported in landscape while the rest is in portrait. The landscaped video would then be stretched upwards to fit the portrait aspect ratio. It seems that iPhone saves the video as landscape (even if it is in portrait), then the system uses the metadata to display it as portrait.

To combat this, I attempted to detect if a video is landscape (or in another rotation), and then manually transform it to portrait. However, when I do this, it seems like the transformation is applied to the entire track, which results in the entire composition rendering in landscape with some of the videos rendering in landscape and others in portrait. I can't figure out how to apply transformations to only a single video. I've tried using multiple tracks, but then only one video is shown and the rest of the tracks are ignored. Here is an example of the exported video (it's rendered like this, it should render as 9:16 but with the transformation it renders 16:9, notice the second clip is distorted although it is originally recorded in portrait).

video example of issue

Here's my code:

private static func mergeVideos(
    videoPaths: [URL],
    outputURL: URL,
    handler: @escaping (_ path: URL)-> Void
  ) {
    let videoComposition = AVMutableComposition()
    var lastTime: CMTime = .zero
    
    guard let videoCompositionTrack = videoComposition.addMutableTrack(withMediaType: .video, preferredTrackID: Int32(kCMPersistentTrackID_Invalid)) else { return }
    
    for path in videoPaths {
      let assetVideo = AVAsset(url: path)
      
      getTracks(assetVideo, .video) { videoTracks in
        // Add video track
        do {
          try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: assetVideo.duration), of: videoTracks[0], at: lastTime)
          
          // Apply the original transform
          if let assetVideoTrack = assetVideo.tracks(withMediaType: AVMediaType.video).last {
            let t = assetVideoTrack.preferredTransform
            let size = assetVideoTrack.naturalSize
            
            let videoAssetOrientation: CGImagePropertyOrientation

            if size.width == t.tx && size.height == t.ty {
              print("down")
              
              videoAssetOrientation = .down
              videoCompositionTrack.preferredTransform = CGAffineTransform(rotationAngle: .pi) // 180 degrees
            } else if t.tx == 0 && t.ty == 0 {
              print("up")
              
              videoCompositionTrack.preferredTransform = assetVideoTrack.preferredTransform
              videoAssetOrientation = .up
            } else if t.tx == 0 && t.ty == size.width {
              print("left")
              
              videoAssetOrientation = .left
              videoCompositionTrack.preferredTransform = CGAffineTransform(rotationAngle: .pi / 2) // 90 degrees to the right

            } else {
              print("right")
              
              videoAssetOrientation = .right
              videoCompositionTrack.preferredTransform = CGAffineTransform(rotationAngle: -.pi / 2) // 90 degrees to the left
            }
          }
          
        } catch {
          print("Failed to insert video track")
          return
        }
        
        self.getTracks(assetVideo, .audio) { audioTracks in
          // Add audio track only if it exists
          if !audioTracks.isEmpty {
            do {
              try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: assetVideo.duration), of: audioTracks[0], at: lastTime)
            } catch {
              print("Failed to insert audio track")
              return
            }
          }
          
          // Update time
          lastTime = CMTimeAdd(lastTime, assetVideo.duration)
        }
      }
    }
        
    guard let exporter = AVAssetExportSession(asset: videoComposition, presetName: AVAssetExportPresetHighestQuality) else { return }
    exporter.outputURL = outputURL
    exporter.outputFileType = AVFileType.mp4
    exporter.shouldOptimizeForNetworkUse = true
    exporter.exportAsynchronously(completionHandler: {
      switch exporter.status {
      case .failed:
        print("Export failed \(exporter.error!)")
      case .completed:
        print("completed export")
        handler(outputURL)
      default:
        break
      }
    })
  }

Anyone know what I am missing here? Any help is greatly appreciated.


Solution

  • Transform to videoCompositionTrack influence the whole track. You can use AVVideoComposition to do it, it use AVVideoCompositionInstruction to do video processing by timing.

    Here is the code without unimportant parts, and rename the videoComposition to mainCompositon to avoid confusion:

    private static func mergeVideos(
        videoPaths: [URL],
        outputURL: URL,
        handler: @escaping (_ path: URL)-> Void
    ) {
        let mainComposition = AVMutableComposition()
        var lastTime: CMTime = .zero
        
        guard let videoCompositionTrack = mainComposition.addMutableTrack(withMediaType: .video, preferredTrackID: Int32(kCMPersistentTrackID_Invalid)) else { return }
        let layerInstruction = AVMutableVideoCompositionLayerInstruction(assetTrack: videoCompositionTrack)
        
        for path in videoPaths {
          let assetVideo = AVAsset(url: path)
          
          getTracks(assetVideo, .video) { videoTracks in
            // Add video track
            do {
              try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: assetVideo.duration), of: videoTracks[0], at: lastTime)
                
              // Apply the original transform
              if let assetVideoTrack = assetVideo.tracks(withMediaType: AVMediaType.video).last {
                  let t = assetVideoTrack.preferredTransform
                  layerInstruction.setTransform(t, at: lastTime) // apply transfrom to track at time.
              }
              
            } catch {
              print("Failed to insert video track")
              return
            }
            
            // deal with audio part ...
          }
        }
        
        let videoCompostion = AVMutableVideoComposition()
        let instruction = AVMutableVideoCompositionInstruction()
        instruction.timeRange = CMTimeRange(start: .zero, end: lastTime)
        videoCompostion.instructions = [instruction]
        
        guard let exporter = AVAssetExportSession(asset: mainComposition, presetName: AVAssetExportPresetHighestQuality) else { return }
        
        // assign videoComposition to exporter
        exporter.videoComposition = videoCompostion
        
        // other export part ...
    }
    

    PS. You'd better to add the getTracks(_:, _:) method to complete the code.