Let's say I have a given number n
of cameras with known intrinsics. Each camera is synchronized, and each camera detected t
frames with a calibration target that was rotated and moved around, seen simultaneously by every camera.
For each frame, I have some objp, imgp
, where imgp
are detected points of the calibration target. I am using them to compute extrinsics [R|t]
(with cv2.solvePnP
.)
How do I average together all this data? If I understand correctly, I can't simply do quaternion averaging and translation averaging on the raw results of cv2.solvePnP
, since - as the target is rotated and moved around - the frame of reference will be different.
I tried changing the frame of reference first, like shown below:
def get_extrinsics_fixed_to_global_reference(R_ref, t_ref, R_other, t_other):
# Get the extrinsics of the other camera in the reference camera's coordinate system
R = R_other @ R_ref.T
t = t_other - R @ t_ref
return R, t
but I got garbage results. I am currently using a setup with n = 2
cameras.
Edit: To clarify, my idea was to essentially compare the resulting transformations between the coordinate systems of cam_0
and cam_k
for k > 0
. For each set of frames (corresponding to a calibration target tilted in a specific way) I would get a set of said transformations (cam_0 -> cam_k
), and they should be in theory the same - this is where I would average out the rotation matrices to then finally get extrinsics. Note: I already know intrinsics for each camera.
Edit 2: I think I fixed it, below is a visualization of what I meant:
I think you are operating under a fundamental misunderstanding: statistics yield useful metrics to evaluate the performance of a system after one has built such a system. They are not, in general, part of the system.
In your case, the statistic you mention is some sort of "average", and the system a calibrated multi-camera rig and associated software. Averaging some partial calibration data you have computed won't generally improve your performance, because the performance goal is obtained by computing an optimum (namely, the set of extrinsic parameters that minimize the reprojection errors), not by running a poll asking each camera what it thinks the solution is. There is an underlying physical truth your are trying to discover, not an election to win.
The correct way to improve the accuracy of the extrinsic calibration of your multi-camera setup it is to merge all the data you have collected (the target point coordinates) into a single dataset, and then run a global bundle adjustment procedure on it.
See, for more info: