arkitvisionoshand-tracking

Predicted hand tracking in Vision OS 2.0


I would like to know how predicted hand tracking works in ARKit. Does it use the Kalman filter, or does it have another approach, like machine learning?

I searched a lot but found no paper or website explaining how this prediction works.


Solution

  • Two antipodal aspects of a skeletal hand tracking are accuracy vs latency. When choosing any filtering/AI/solver algorithm, you sacrifice a low latency for a high accuracy, or vice versa. Kalman filter/predictor is optimal for both parameters. In this Disney research, the team used 22 individual Kalman Filters for each joint, which were working simultaneously. It's a shame that Apple very rarely publishes scientific papers about what principles its products or APIs are based on, however, I'm 99% sure it's the Kalman filter/predictor that's used in ARKit/RealityKit body tracking and hand tracking. Why reinvent the wheel?

    I would like to add that when using the .predicted tracking mode, nothing can prevent Cupertino engineers from using a symbiosis of solutions: Kalman predictor (that uses past data to predict a joint's motion) with predictive AI models (when some joints occluded) with inverse kinematics based on specific solver.