I am trying to improve the performance of drawing the skeleton with body tracking for VNDetectHumanBodyPoseRequest
even when further than 5 metres away, and with a stable iPhone XS camera.
The tracking has low confidence for the lower right limbs of my body, noticeable lag and there is jitter. I am unable to replicate the showcased performance in this year's WWDC demo video.
Here is the relevant code, adapted from Apple's sample code:
class Predictor {
func extractPoses(_ sampleBuffer: CMSampleBuffer) throws -> [VNRecognizedPointsObservation] {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: .down)
let request = VNDetectHumanBodyPoseRequest()
do {
// Perform the body pose-detection request.
try requestHandler.perform([request])
} catch {
print("Unable to perform the request: \(error).\n")
}
return (request.results as? [VNRecognizedPointsObservation]) ?? [VNRecognizedPointsObservation]()
}
}
I've captured the video data and am handling the sample buffers here:
class CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let observations = try? predictor.extractPoses(sampleBuffer)
observations?.forEach { processObservation($0) }
}
func processObservation(_ observation: VNRecognizedPointsObservation) {
// Retrieve all torso points.
guard let recognizedPoints =
try? observation.recognizedPoints(forGroupKey: .all) else {
return
}
let storedPoints = Dictionary(uniqueKeysWithValues: recognizedPoints.compactMap { (key, point) -> (String, CGPoint)? in
return (key.rawValue, point.location)
})
DispatchQueue.main.sync {
let mappedPoints = Dictionary(uniqueKeysWithValues: recognizedPoints.compactMap { (key, point) -> (String, CGPoint)? in
guard point.confidence > 0.1 else { return nil }
let norm = VNImagePointForNormalizedPoint(point.location,
Int(drawingView.bounds.width),
Int(drawingView.bounds.height))
return (key.rawValue, norm)
})
let time = 1000 * observation.timeRange.start.seconds
// Draw the points onscreen.
DispatchQueue.main.async {
self.drawingView.draw(points: mappedPoints)
}
}
}
}
The drawingView.draw
function is for a custom UIView
on top of the camera view, and draws the points using CALayer
sublayers. The AVCaptureSession
code is exactly the same as the sample code here.
I tried using the VNDetectHumanBodyPoseRequest(completionHandler:)
variant but this made no difference to the performance for me. I could try smoothing with a moving average filter though.. but there is still a problem with outlier predictions which are very inaccurate.
What am I missing?
This was a bug on iOS 14 beta v1-v3 I think. After upgrading to v4 and later beta releases, tracking is much better. The API also became a bit clearer with fine-grained type names with the latest beta updates.
Note I didn’t get an official answer from Apple regarding this bug, but this problem will probably completely disappear in the official iOS 14 release.