When implementing monocular SLAM or Structure from Motion using single camera, translation can be estimated up to unknown scale. It is proven that without any other external information, this scale can not be determined. However, my question: How to unify this scale in all sub translations. For example, if we have 3 frame (Frame0, Frame1 & Frame2), we applied tracking as follow:
The problem is T01 & T12 are normalized so their magnitude is 1. However, in real, T01 magnitude may be twice as T12.
How can I recover the Relative magnitude between T01 and T12?
P.S. I do not want to know what is exactly T01 or T12. I just want to know that |T01| = 2 * |T12|.
I think it is possible because Monocular SLAM or SFM algorithms are already exists and working well. So, there should be some way to do this.
Calculate R,t between frames 2 & 0 and connect a triangle between the three vertices formed by the three frames. the only possible closed triangle (up to a single scale) will be formed when the relative translations are known up to a scale.