iosmetalinstrumentscore-imagemetalkit

High CPU usage with CoreImage vs Metal


I am processing CVPixelBuffers received from camera using both Metal and CoreImage, and comparing the performance. The only processing that is done is taking a source pixel buffer and applying crop & affine transforms, and saving the result to another pixel buffer. What I do notice is CPU usage is as high a 50% when using CoreImage and only 20% when using Metal. The profiler shows most of the time spent is in CIContext render:

        let cropRect = AVMakeRect(aspectRatio: CGSize(width: dstWidth, height: dstHeight), insideRect: srcImage.extent) //drawRect.applying(CGAffineTransform.init(scaleX: dstWidth/drawRect.width, y: dstHeight/drawRect.height))
        
        var dstImage = srcImage.cropped(to: cropRect)

         let translationTransform = CGAffineTransform(translationX: -cropRect.minX, y: -cropRect.minY)
        var transform = CGAffineTransform.identity
        transform = transform.concatenating(CGAffineTransform(translationX: -(dstImage.extent.origin.x + dstImage.extent.width/2), y: -(dstImage.extent.origin.y + dstImage.extent.height/2)))
        transform = transform.concatenating(translationTransform)
        transform = transform.concatenating(CGAffineTransform(translationX: (dstImage.extent.origin.x + dstImage.extent.width/2), y: (dstImage.extent.origin.y + dstImage.extent.height/2)))
        
        dstImage = dstImage.transformed(by: translationTransform)
        
        
        let scale = max(dstWidth/(dstImage.extent.width), CGFloat(dstHeight/dstImage.extent.height))
        
        
        let scalingTransform = CGAffineTransform(scaleX: scale, y: scale)
        
        transform = CGAffineTransform.identity
     
        transform = transform.concatenating(scalingTransform)
      
        dstImage = dstImage.transformed(by: transform)
        
        if flipVertical {
            dstImage = dstImage.transformed(by: CGAffineTransform(scaleX: 1, y: -1))
            dstImage = dstImage.transformed(by: CGAffineTransform(translationX: 0, y: dstImage.extent.size.height))
        }
        
        if flipHorizontal {
            dstImage = dstImage.transformed(by: CGAffineTransform(scaleX: -1, y: 1))
            dstImage = dstImage.transformed(by: CGAffineTransform(translationX: dstImage.extent.size.width, y: 0))
        }

        var dstBounds = CGRect.zero
        dstBounds.size = dstImage.extent.size
        _ciContext.render(dstImage, to: dstPixelBuffer!, bounds: dstImage.extent, colorSpace: srcImage.colorSpace )

Here is how CIContext was created:

  _ciContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!, options: [CIContextOption.cacheIntermediates: false])

I want to know if I am doing anything wrong and what could be done to lower CPU usage in CoreImage?


Solution

  • Every time you render a CIImage with a CIContext, CI does a filter graph analysis to determine the best path for rendering the image (determining intermediates, region of interest, kernel concatenation, etc.). This can be quite CPU-intensive.

    If you only have a few simple operations to perform on your image, and you can easily implement them in Metal directly, you are probably better off using that.

    However, I would also suggest you file Feedback with the Core Image team and report your findings. We also observe a very heavy CPU load in our apps, caused by Core Image. Maybe they find a way to further optimize the graph analysis – especially for consecutive render calls with the same instructions.