python opencv computer-vision camera-calibration

How to average camera extrinsics each calculated from a target detected at different angles?

Let's say I have a given number n of cameras with known intrinsics. Each camera is synchronized, and each camera detected t frames with a calibration target that was rotated and moved around, seen simultaneously by every camera.

For each frame, I have some objp, imgp, where imgp are detected points of the calibration target. I am using them to compute extrinsics [R|t] (with cv2.solvePnP.)

How do I average together all this data? If I understand correctly, I can't simply do quaternion averaging and translation averaging on the raw results of cv2.solvePnP, since - as the target is rotated and moved around - the frame of reference will be different.

I tried changing the frame of reference first, like shown below:

def get_extrinsics_fixed_to_global_reference(R_ref, t_ref, R_other, t_other):

    # Get the extrinsics of the other camera in the reference camera's coordinate system
    R = R_other @ R_ref.T
    t = t_other - R @ t_ref

    return R, t

but I got garbage results. I am currently using a setup with n = 2 cameras.

Edit: To clarify, my idea was to essentially compare the resulting transformations between the coordinate systems of cam_0 and cam_k for k > 0. For each set of frames (corresponding to a calibration target tilted in a specific way) I would get a set of said transformations (cam_0 -> cam_k), and they should be in theory the same - this is where I would average out the rotation matrices to then finally get extrinsics. Note: I already know intrinsics for each camera.

Edit 2: I think I fixed it, below is a visualization of what I meant:

Solution

I think you are operating under a fundamental misunderstanding: statistics provide useful metrics to evaluate the performance of a computer vision algorithm after one has implemented the algorithm. They are not, in general, a part of the algorithm.

In your case, the statistic you mention is some sort of "average", and the system a calibrated multi-camera rig and associated software. Just averaging a collection of partial calibration data you have computed won't generally improve your performance, because the performance goal is obtained by computing an optimum (namely, the set of extrinsic parameters that minimize the reprojection errors), not by running a poll asking each camera what it thinks the solution is. There is an underlying physical truth your are trying to discover, not an election to win.

The correct way to improve the accuracy of the extrinsic calibration of your multi-camera setup it is to merge all the data you have collected (the target point coordinates) into a single dataset, and then run a global bundle adjustment procedure on it.

See, for more info:

Old but still good: Bill Triggs's review of bundle adjustment
Software: Ceres solver