I am processing CVPixelBuffers received from camera using both Metal and CoreImage, and comparing the performance. The only processing that is done is taking a source pixel buffer and applying crop & affine transforms, and saving the result to another pixel buffer. What I do notice is CPU usage is as high a 50% when using CoreImage and only 20% when using Metal. The profiler shows most of the time spent is in CIContext render:
let cropRect = AVMakeRect(aspectRatio: CGSize(width: dstWidth, height: dstHeight), insideRect: srcImage.extent) //drawRect.applying(CGAffineTransform.init(scaleX: dstWidth/drawRect.width, y: dstHeight/drawRect.height))
var dstImage = srcImage.cropped(to: cropRect)
let translationTransform = CGAffineTransform(translationX: -cropRect.minX, y: -cropRect.minY)
var transform = CGAffineTransform.identity
transform = transform.concatenating(CGAffineTransform(translationX: -(dstImage.extent.origin.x + dstImage.extent.width/2), y: -(dstImage.extent.origin.y + dstImage.extent.height/2)))
transform = transform.concatenating(translationTransform)
transform = transform.concatenating(CGAffineTransform(translationX: (dstImage.extent.origin.x + dstImage.extent.width/2), y: (dstImage.extent.origin.y + dstImage.extent.height/2)))
dstImage = dstImage.transformed(by: translationTransform)
let scale = max(dstWidth/(dstImage.extent.width), CGFloat(dstHeight/dstImage.extent.height))
let scalingTransform = CGAffineTransform(scaleX: scale, y: scale)
transform = CGAffineTransform.identity
transform = transform.concatenating(scalingTransform)
dstImage = dstImage.transformed(by: transform)
if flipVertical {
dstImage = dstImage.transformed(by: CGAffineTransform(scaleX: 1, y: -1))
dstImage = dstImage.transformed(by: CGAffineTransform(translationX: 0, y: dstImage.extent.size.height))
}
if flipHorizontal {
dstImage = dstImage.transformed(by: CGAffineTransform(scaleX: -1, y: 1))
dstImage = dstImage.transformed(by: CGAffineTransform(translationX: dstImage.extent.size.width, y: 0))
}
var dstBounds = CGRect.zero
dstBounds.size = dstImage.extent.size
_ciContext.render(dstImage, to: dstPixelBuffer!, bounds: dstImage.extent, colorSpace: srcImage.colorSpace )
Here is how CIContext was created:
_ciContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!, options: [CIContextOption.cacheIntermediates: false])
I want to know if I am doing anything wrong and what could be done to lower CPU usage in CoreImage?
Every time you render a CIImage
with a CIContext
, CI does a filter graph analysis to determine the best path for rendering the image (determining intermediates, region of interest, kernel concatenation, etc.). This can be quite CPU-intensive.
If you only have a few simple operations to perform on your image, and you can easily implement them in Metal directly, you are probably better off using that.
However, I would also suggest you file Feedback with the Core Image team and report your findings. We also observe a very heavy CPU load in our apps, caused by Core Image. Maybe they find a way to further optimize the graph analysis – especially for consecutive render calls with the same instructions.