I've developed an app that receives video frames from WebRTC, resizes them, and then uses a Core ML model for prediction. The source video frames are in CVPixelBuffer format with 420f PixelFormat, which my Core ML model doesn't accept. Therefore, I'm using MTLTexture to convert the CVPixelBuffer. Here's the relevant code:
// sourcePixelBuffer (420f) -> PixelBuffer (BGRA) & Resized PixelBuffer(BGRA)
public func processYUV420Frame2(sourcePixelBuffer: CVPixelBuffer, targetSize: CGSize)
-> (original: CVPixelBuffer?, resized: CVPixelBuffer?) {
guard let queue = self.commandQueue else {
print("FrameMixer makeCommandBuffer failed")
return (nil, nil)
}
let sourceWidth = CVPixelBufferGetWidth(sourcePixelBuffer)
let sourceHeight = CVPixelBufferGetHeight(sourcePixelBuffer)
guard let (yTexture, uvTexture) = createYUVTexturesFromPixelBuffer(sourcePixelBuffer) else {
print("Failed to create YUV textures")
return (nil, nil)
}
var originalBuffer: CVPixelBuffer?
var resizedBuffer: CVPixelBuffer?
autoreleasepool {
let (originalTexture, resizedTexture) = convertYUVtoDualBGRA(
device: metalDevice!,
commandQueue: queue,
yTexture: yTexture,
uvTexture: uvTexture,
sourceWidth: sourceWidth,
sourceHeight: sourceHeight,
targetWidth: Int(targetSize.width),
targetHeight: Int(targetSize.height)
)
originalBuffer = createCVPixelBuffer(from: originalTexture)
resizedBuffer = createCVPixelBuffer(from: resizedTexture)
}
return (originalBuffer, resizedBuffer)
}
func createYUVTexturesFromPixelBuffer(_ pixelBuffer: CVPixelBuffer) -> (y: MTLTexture, uv: MTLTexture)? {
guard let textureCache = textureCache else {
print("make buffer failed, texture cache is not exist")
return nil
}
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
var textureY: CVMetalTexture?
var textureUV: CVMetalTexture?
CVMetalTextureCacheFlush(textureCache, 0)
// Create Y planer texture
CVMetalTextureCacheCreateTextureFromImage(
kCFAllocatorDefault,
textureCache,
pixelBuffer,
nil,
.r8Unorm,
width,
height,
0,
&textureY
)
// Create UV planer texture
CVMetalTextureCacheCreateTextureFromImage(
kCFAllocatorDefault,
textureCache,
pixelBuffer,
nil,
.rg8Unorm,
width / 2,
height / 2,
1,
&textureUV
)
guard let unwrappedTextureY = textureY, let unwrappedTextureUV = textureUV else {
return nil
}
let y = CVMetalTextureGetTexture(unwrappedTextureY)!
let uv = CVMetalTextureGetTexture(unwrappedTextureUV)!
textureY = nil
textureUV = nil
return (y, uv)
}
func convertYUVtoDualBGRA(device: MTLDevice, commandQueue: MTLCommandQueue, yTexture: MTLTexture, uvTexture: MTLTexture, sourceWidth: Int, sourceHeight: Int, targetWidth: Int, targetHeight: Int) -> (original: MTLTexture, resized: MTLTexture) {
let originalTexture = createBGRATexture(device: device, width: sourceWidth, height: sourceHeight)
let resizedTexture = createBGRATexture(device: device, width: targetWidth, height: targetHeight)
guard let commandBuffer = commandQueue.makeCommandBuffer(),
let computeEncoder = commandBuffer.makeComputeCommandEncoder() else {
fatalError("Failed to create command buffer or compute encoder")
}
computeEncoder.setComputePipelineState(dualOutputPipelineState!)
computeEncoder.setTexture(yTexture, index: 0)
computeEncoder.setTexture(uvTexture, index: 1)
computeEncoder.setTexture(originalTexture, index: 2)
computeEncoder.setTexture(resizedTexture, index: 3)
var uniforms = DualOutputUniforms(sourceWidth: Float(sourceWidth), sourceHeight: Float(sourceHeight),
targetWidth: Float(targetWidth), targetHeight: Float(targetHeight))
computeEncoder.setBytes(&uniforms, length: MemoryLayout<DualOutputUniforms>.size, index: 0)
let threadGroupSize = MTLSizeMake(16, 16, 1)
let threadGroups = MTLSizeMake((sourceWidth + threadGroupSize.width - 1) / threadGroupSize.width,
(sourceHeight + threadGroupSize.height - 1) / threadGroupSize.height,
1)
computeEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupSize)
computeEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
return (originalTexture, resizedTexture)
}
This process works fine without memory issues, and the format is correctly converted from 420f to BGRA. However, when using the resized CVPixelBuffer as input for the Core ML model, resources aren't released properly when video frames are continuously processed. This can lead to memory usage of up to 4.5GB, causing the app to crash on iPad.
Are there any tips or methods to solve this memory leak problem? How can I ensure proper resource management when continuously processing video frames with Core ML?
I found the problem is the code as below that I got the returned outputShapedArray to get the float array:
func runMidasWithResized(on pixelbuffer: CVPixelBuffer)
-> [Float]?
{
var results: [Float]? = nil
guard let prediction = try? mlmodel!.prediction(input: pixelbuffer) else {
os_log("Prediction failed", type: .error)
return nil
}
// Get result into format
var shapedArray = prediction.outputShapedArray
results = Array(repeating: 0.0, count: shapedArray.strides[0])
shapedArray.withUnsafeMutableShapedBufferPointer { bufferPointer, shape, strides in
results = Array(bufferPointer)
}
return results
}
}
Where the created results seem to be not released.
I modified the method to get the result as below:
// Get result into format
let output = prediction.output
if let bufferPointer = try? UnsafeBufferPointer<Float>(output) {
results = Array(bufferPointer)
}
return results
Then it will release the created resource normally.