pythonopencvgraphics3dcomputer-vision

How to project a 3d point from a 3d world with a specific camera position into a 2d img?


I have some hard time understanding, how a 3D point from a scene can be translated into a 2d Image.

I have created a scene in Blender where my camera is positioned at P_cam(0|0|0) at is looking between the x and y axis (rotation of x=90° and z=-45°).

I spawned a test cube at pos_c (5|5|0).

The img I see looks like this:

expected

I now want to create the same img with native python. Therefore I use opencv to map 3D points to 2D points. I created my camera matrix (with (0|0) refering to the middle of the img)

cx = width/2 #principal point is our image center
cy = height/2
camera_matrix = np.array([[fx, 0, cx], 
                          [0, fy, cy], 
                          [0, 0, 1]], np.float32) 

I then use openvcs projectPoints method to transform my cubes world coordinates into 2d img points. I apply no transformation in form of the tvec. But i know that i need the rotation vector and thats the tricky part for me. I tried to use the vector with the same values as i did in blender, but it did not work.

cam_rot_in_deg =  (90, 0, -45)
cam_rot_in_rad = np.radians(cam_rot_in_deg)
rvec = np.array([[cam_rot_in_rad]], np.float32).reshape((3,1))

The image I render looks like this:

my fails

Within in the image I also visualized the coordinate axis, so I can see that the cube is drawn at the correct position within the world, but my camera perspective is completly off.

What am I missing? Thank you so much

I tried numerous rotation angles, but I did not came accros the desired solution.


Solution

  • Your issue is here:

    cam_rot_in_deg =  (90, 0, -45)
    cam_rot_in_rad = np.radians(cam_rot_in_deg)
    rvec = np.array([[cam_rot_in_rad]], np.float32).reshape((3,1))
    

    OpenCV's "rvec" is an axis-angle representation, not Euler angles. What you did there is Euler angles. That's no good.

    Instead of rvecs (and tvecs), you can just deal with matrices. I'd recommend 4x4 for 3D affine transforms. They're trivial to compose using matrix multiplication, which is @ for numpy arrays. That way you can compose any individual primitive rotations around any axes you like.

    OpenCV has the cv.Rodrigues() function that converts back and forth between axis-angle vector and (3x3) rotation matrix.

    For convenience, you should define utility functions that build matrices for translation (translation along each axis), scaling (factor, or factor per dimension), rotation (from axis and angle).

    Another convenience function you'd probably need will take a 4x4 transform matrix apart into its rotation and translation parts. Since slicing the rotation part is trivial already, that function should also call Rodrigues because then the function can turn a transform matrix into rvec and tvec, which is commonly used within OpenCV. A function to do the forward thing (rtvec to 4x4) just does the opposite.

    Euler angles are generally unwieldy (gimbal lock), so they aren't used as a medium for anything, merely for initialization. They're confusing too, because everyone has their own conventions for the order of the rotations, and even which axes there are.