python opencv computer-vision homography

How to estimate the extrinsic matrix of a chessboard image and project it to bird's eye view such it presents pixel size in meters?

I want to generate an Occupancy Grid (OG) like image with a Bird's Eye View (BEV), i.e., each image pixel has a constant unit measure and everything on the final grid is floor (height=0).

I don't know what I'm missing, I'm newbie on the subject and I'm trying to follow a pragmatic step by step to get on the final results. I have spent a huge time on this and I'm still getting poor results. I'd appretiate any help. Thanks.

To get on my desired results, I follow the pipeline:

Estimate the extrinsic matrix with cv2.solvePnP and a chessboard image.
Generate the OG grid XYZ world coordinates (X=right, Y=height, Z=forward).
Project the OG grid XYZ camera coordinates with the extrinsic matrix.
Match the uv image coordinates for the OG grid camera coordinates.
Populate the OG image with the uv pixels.

I have the following intrinsic and distortion matrices that I previously estimated from another 10 chessboard images like the one bellow:

1. Estimate the extrinsic matrix

import numpy as np
import cv2
import matplotlib.pyplot as plt


mtx = np.array([[2029,    0, 2029],
                [   0, 1904, 1485],
                [   0,    0,    1]]).astype(float)

dist = np.array([[-0.01564965,  0.03250585,  0.00142366,  0.00429703, -0.01636045]])

impath = '....'
img = cv2.imread(impath)

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
CHECKERBOARD = (5, 8)
ret, corners = cv2.findChessboardCorners(gray, CHECKERBOARD, None)
corners = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)

objp = np.concatenate(
            np.meshgrid(np.arange(-4, 4, 1),
                        0,
                        np.arange(0, 5, 1), 
                        )
        ).astype(float)

objp = np.moveaxis(objp, 0, 2).reshape(-1, 3)

square_size = 0.029
objp *= square_size

ret, rvec, tvec = cv2.solvePnP(objp, corners[::-1], mtx, dist)
print('rvec:', rvec.T)
print('tvec:', tvec.T)

# img_withaxes = cv2.drawFrameAxes(img.copy(), mtx, dist, rvec, tvec, square_size, 3)
# plt.imshow(cv2.resize(img_withaxes[..., ::-1], (800, 600)))


# rvec: [[ 0.15550242 -0.03452503 -0.028686  ]]
# tvec: [[0.03587237 0.44082329 0.62490573]]

R = cv2.Rodrigues(rvec)[0]
RT = np.eye(4)
RT[:3, :3] = R
RT[:3, 3] = tvec.ravel()
RT.round(2)

# array([[-1.  ,  0.03,  0.04,  0.01],
#        [ 0.03,  0.99,  0.15, -0.44],
#        [-0.03,  0.16, -0.99,  0.62],
#        [ 0.  ,  0.  ,  0.  ,  1.  ]])

2. Generate the OG grid XYZ world coordinates (X=right, Y=height, Z=forward).

uv_dims = img.shape[:2] # h, w
grid_dims = (500, 500) # h, w

og_grid = np.concatenate(
                np.meshgrid(
                    np.arange(- grid_dims[0] // 2, (grid_dims[0] + 1) // 2, 1),
                    0, # I want only the floor information, such that height = 0
                    np.arange(grid_dims[1]),
                    1
                    )
                )
og_grid = np.moveaxis(og_grid, 0, 2)

edge_size = .1
og_grid_3dcoords = og_grid * edge_size
print(og_grid_3dcoords.shape)

# (500, 500, 4, 1)

3. Project the OG grid XYZ camera coordinates with the extrinsic matrix.

og_grid_camcoords = (RT @ og_grid_3dcoords.reshape(-1, 4).T)
og_grid_camcoords = og_grid_camcoords.T.reshape(grid_dims + (4,))
og_grid_camcoords /= og_grid_camcoords[..., [2]]
og_grid_camcoords = og_grid_camcoords[..., :3]

# Print for debugging issues
for i in range(og_grid_camcoords.shape[-1]):
    print(np.quantile(og_grid_camcoords[..., i].clip(-10, 10), np.linspace(0, 1, 11)).round(1))

# [-10.   -1.3  -0.7  -0.4  -0.2  -0.    0.2   0.4   0.6   1.2  10. ]
# [-10.   -0.2  -0.2  -0.2  -0.2  -0.2  -0.1  -0.1  -0.1  -0.1  10. ]
# [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

4. Match the uv image coordinates for the OG grid coordinates.

og_grid_uvcoords = (mtx @ og_grid_camcoords.reshape(-1, 3).T)
og_grid_uvcoords = og_grid_uvcoords.T.reshape(grid_dims + (3,))
og_grid_uvcoords = og_grid_uvcoords.clip(0, max(uv_dims)).round().astype(int)
og_grid_uvcoords = og_grid_uvcoords[..., :2]

# Print for debugging issues
for i in range(og_grid_uvcoords.shape[-1]):
    print(np.quantile(og_grid_uvcoords[..., i], np.linspace(0, 1, 11)).round(1))

# [   0.    0.  665. 1134. 1553. 1966. 2374. 2777. 3232. 4000. 4000.]
# [   0. 1134. 1161. 1171. 1181. 1191. 1201. 1212. 1225. 1262. 4000.]

Clip to uv values to the image boundaries.

mask_clip_height = (og_grid_uvcoords[..., 1] >= uv_dims[0])
og_grid_uvcoords[mask_clip_height, 1] = uv_dims[0] - 1

mask_clip_width = (og_grid_uvcoords[..., 0] >= uv_dims[1])
og_grid_uvcoords[mask_clip_width, 0] = uv_dims[1] - 1

5. Populate the OG image with the uv pixels.

og = np.zeros(grid_dims + (3,)).astype(int)

for i, (u, v) in enumerate(og_grid_uvcoords.reshape(-1, 2)):
    og[i % grid_dims[1], i // grid_dims[1], :] = img[v, u]

plt.imshow(og)

I was expecting a top-down view of the test image.

Solution

In the end, It turned out that I made a mistake which the "homogenous point 1" of world homogenous coordinates was also been scaled by the edge_size in part "2" of the pipeline. Fixing this and rearranging the mesh order of the z-axis in the OG-grid yielded the BEV of the image that I expected.

The fixed snippet:

uv_dims = img.shape[:2] # h, w
grid_dims = (500, 500) # h, w

og_grid = np.concatenate(
                np.meshgrid(
                    np.arange(- grid_dims[0] // 2, (grid_dims[0] + 1) // 2, 1),
                    0, # I want only the floor information, such that height = 0
                    np.arange(grid_dims[1] - 1, -1, -1),
                    1
                    )
                )
og_grid = np.moveaxis(og_grid, 0, 2)

edge_size = .1
og_grid_3dcoords = og_grid * edge_size
og_grid_3dcoords[:, :, 3, :] = 1
print(og_grid_3dcoords.shape)
# (500, 500, 4, 1)

The final outcome: