i am trying to build a pose estimation based model which is capable of Identifying the incorrect movement of a pose relative to a predefined action. Example like performing excercises squats, pull-ups, yoga, etc Among these if the user is not doing an action as per the one given in instructions means i have to find out those points in pose.
What i have tried so far: built a rule based logics which identifies the direction and angles between two joints/lines and gives corrections
But the problem with this is we can't write rules for all the frames in action sequence. So looking for a better solution.
This idea could work if timing and sequence of the motions is not important. This also assumes that you know what motion is being attempted (tho this could be modified for a kinda brute force classification as well):
Create a way to record continuous motions in the way you described (angles of indexed joints). Then in order to train a new motion, collect a "golden set" where that motion is performed as perfectly as possible several times 5-10 (1 would work fine as well if you can ensure a high quality sample). Combine those sets (concat them, don't average or anything). You can think of each data point as a high dimensional data point like xyz but with as many joint angles as you are tracking. If speed/ optimization is a concern, you will probably want to sort this data to improve subsequent searching.
It depends on which joints you care about, but if some joints move much more than others, you may want to normalize the angle data per joint (ie instead of using their raw values, divide their raw values by their total range in the motion). The benefit of this is that it will keep one joint with huge motion from overpowering one with much less but which is still important. A note though is to be careful of joints with very little motion, as normalizing them can significantly increase noise, so you should probably just zero any joint that is below a certain range in the motion.
Now when someone is performing the motion, take a live sample of the user data (joint angles, and normalize each joint the same as in the golden data if you performed normalization) and find the "closest" point in your combined golden sample. In the simplest sense "closeness" can be a high dimensional distance estimate. Then in comparing the live data to the closest golden, you can inform the user in what ways their current pose is inaccurate. Squared distance can be used since it is just for ranking, so all you would need to do is find the difference in each dimension ie (angle1Difference, angle2Difference, angle3Difference,...) and sum their squares: ie distSq = a1D x a1D + a2D x a2D +a3D x a3D...
Note that for a given data point you need to capture all the joints you care about together (the independent data sets are less meaningful as it could allow the right range of motions but in the wrong order or coordination)