python opencv computer-vision camera-calibration homography

Which openCv function can be used to compute BEV perspective transformation given a point coordinates and the camera extrinsics/intrinsics?

I have the 3x3 intrinsics and 4x3 extrinsics matrices for my camera obtained via cv2.calibrateCamera()

Now I want to use these paramenters to compute the BEV (Bird Eye View) transformation for any given coordinates in a frame obtained from the camera.

Which openCv function can be used to compute the BEV perspective transformation for given point coordinates and the camera extrinsics and/or intrinsics 3x3 matrices?

I found something very related in the following post: https://deepnote.com/article/social-distancing-detector/ based on https://www.pyimagesearch.com/2014/08/25/4-point-opencv-getperspective-transform-example/ ,

they are using cv2.getPerspectiveTransform() to get a 3X3 matrix, but I don't know whether this matrix represents the intrinsics, the extrinsecs or something else. Then they are transforming the list of points using such matrix in the following way:

#Assuming list_downoids is the list of points to be transformed and matrix is the one obtained above
list_points_to_detect = np.float32(list_downoids).reshape(-1, 1, 2)
transformed_points = cv2.perspectiveTransform(list_points_to_detect, matrix)

I really need to know if I can use this cv2.perspectiveTransform function to compute the transformation or if there's another better way to do this using the extrinsics, the intrinsics or both, without having to reuse the frame, since I already have the detected/selected coordinates saved in an array.

Solution

After a deep investigation, I found out a good solution:

The projection matrix is a multiplication between theextrinsic and the intrinsic camera matrices

https://medium.com/analytics-vidhya/using-homography-for-pose-estimation-in-opencv-a7215f260fdd
Since the extrinsic is a 4x3 matrix and the intrinsec is a 3x3 matrix, but we need the projection matrix to be a 3x3 matrix, then we need to convert the extrinsic into 3x3 before performing the multiplication.

cv2.getPerspectiveTransform() gives us the Projection Matrix when we don't have the camera params:

https://towardsdatascience.com/a-hands-on-application-of-homography-ipm-18d9e47c152f

cv2.warpPerspective() transforms the image itsef.

For the problem above we don't need these two functions since we already have the extrinsics, the intrinsecs and the coordinates of the points in the image.

Considering the presented above, I wrote a function to transform into BEV a list o points list_x_y given the intrinsics and the extrinsics:

    def compute_point_perspective_transformation(intrinsics, extrinsics, point_x_y):
    """Auxiliary function to project a specific point to BEV
        
        Parameters
        ----------
        intrinsics (array)     : The camera intrinsics matrix
        extrinsics (array)     : The camera extrinsics matrix
        point_x_y (tuple[x,y]) : The coordinates of the point to be projected to BEV
        
        Returns
        ----------
        tuple[x,y] : the projection of the point
    """
        # Using the camera calibration for Bird Eye View
        intrinsics_matrix = np.array(intrinsics, dtype='float32')
        #In the intrinsics we have parameters such as focal length and the principal point

        extrinsics_matrix = np.array(extrinsics, dtype='float32')
        #The extrinsic matrix stores the position of the camera in global space
        #The 1st 3 columns represents the rotation matrix and the last is a translation vector
        extrinsics = extrinsics[:, [0, 1, 3]]

        #We removed the 3rd column of the extrinsics because it represents the z coordinate (0)
        projection_matrix = np.matmul(intrinsics_matrix, extrinsics_matrix)

        # Compute the new coordinates of our points - cv2.perspectiveTransform expects shape 3
        list_points_to_detect = np.array([[point_x_y]], dtype=np.float32)
        transformed_points = cv2.perspectiveTransform(list_points_to_detect, projection_matrix)
        return transformed_points[0][0][0], transformed_points[0][0][1]