computer-vision homography image-stitching 360-panorama projective-geometry

Compute Homography Matrix based on intrinsic and extrinsic camera parameters

I am willing to perform a 360° Panorama stitching for 6 fish-eye cameras.

In order to find the relation among cameras I need to compute the Homography Matrix. The latter is usually computed by finding features in the images and matching them.

However, for my camera setup I already know:

The intrinsic camera matrix K, which I computed through camera calibration.
Extrinsic camera parameters R and t. The camera orientation is fixed and does not change at any point. The cameras are located on a circle of known diameter d, being each camera positioned with a shift of 60° degrees with respect to the circle.

Therefore, I think I could manually compute the Homography Matrix, which I am assuming would result in a more accurate approach than performing feature matching.

In the literature I found the following formula to compute the homography Matrix which relates image 2 to image 1:

H_2_1 = (K_2) * (R_2)^-1 * R_1 * K_1

This formula only takes into account a rotation angle among the cameras but not the translation vector that exists in my case.

How could I plug the translation t of each camera in the computation of H?

I have already tried to compute H without considering the translation, but as d>1 meter, the images are not accurate aligned in the panorama picture.

EDIT:

Based on Francesco's answer below, I got the following questions:

After calibrating the fisheye lenses, I got a matrix K with focal length f=620 for an image of size 1024 x 768. Is that considered to be a big or small focal length?
My cameras are located on a circle with a diameter of 1 meter. The explanation below makes it clear for me, that due to this "big" translation among the cameras, I have remarkable ghosting effects with objects that are relative close to them. Therefore, if the Homography model cannot fully represent the position of the cameras, is it possible to use another model like Fundamental/Essential Matrix for image stitching?

Solution

You cannot "plug" the translation in: its presence along with a nontrivial rotation mathematically implies that the relationship between images is not a homography.

However, if the imaged scene is and appears "far enough" from the camera, i.e. if the translations between cameras are small compared to the distances of the scene objects from the cameras, and the cameras' focal lengths are small enough, then you may use the homography induced by a pure rotation as an approximation.

Your equation is wrong. The correct formula is obtained as follows:

Take a pixel in camera 1: p_1 = (x, y, 1) in homogeneous coordinates
Back project it into a ray in 3D space: P_1 = inv(K_1) * p_1
Decompose the ray in the coordinates of camera 2: P_2 = R_2_1 * P1
Project the ray into a pixel in camera 2: p_2 = K_2 * P_2
Put the equations together: p_2 = [K_2 * R_2_1 * inv(K_1)] * p_1

The product H = K2 * R_2_1 * inv(K1) is the homography induced by the pure rotation R_2_1. The rotation transforms points into frame 2 from frame 1. It is represented by a 3x3 matrix whose columns are the components of the x, y, z axes of frame 1 decomposed in frame 2. If your setup gives you the rotations of all the cameras with respect to a common frame 0, i.e. as R_i_0, then it is R_2_1 = R_2_0 * R_1_0.transposed.

Generally speaking, you should use the above homography as an initial estimation, to be refined by matching points and optimizing. This is because (a) the homography model itself is only an approximation (since it ignores the translation), and (b) the rotations given by the mechanical setup (even a calibrated one) are affected by errors. Using matched pixels to optimize the transformation will minimize the errors where it matters, on the image, rather than in an abstract rotation space.