normalizationmediapipe

How to Normalize Hand Landmark Positions in Video Frames Using MediaPipe?


I am working on a project where I need to track and analyze hand movements in multiple videos using MediaPipe in all frames. The challenge I'm facing is that the distance of the subject from the camera varies, causing the size of the detected hands to change from frame to frame and because of movement I can't use the position of these points. I want to standardize the size of the hand landmarks across different frames to compare movements more accurately.

How can I normalize the positions and/or sizes of hand landmarks detected in a video, considering changes in orientation and distance from the camera? I'm looking for a method to adjust for scale, rotation, and translation of the hand landmarks.


Solution

  • On the one hand, you can normalise your coordinates to account for scale and translation very easily:

    1. Identify the fingers whose x coordinates are the most distanced from each other.
    2. These x coordinates xmin and xmax will be your 0 and 1 normalised values.
    3. Then, for each other finger, thake their x coordinate xn and apply the regular normalisation formula: (xn - xmin)/(xmax - xmin)
    4. Repeat the process for the y-axis.

    Now, all coordinates are normalised. However, you should note that rotating your hand will affect the normalised coordinates. This is because rotation is a bit tricky to work around mainly because:

    It is quite a tricky task, so the complexity you want to add to the normalisation depends solely on your use case. Good luck! And may the code be with you...