[SOLVED] Given camera parameters, how do I find the transform from view space to pixel coordinates? What is wrong with my matrix?

Given camera parameters, how do I find the transform from view space to pixel coordinates? What is wrong with my matrix?

For a specific image containing a known 3d obj model, I have the corresponding model matrix and the camera parameters fx,fy,cx,cy. Having applied the model matrix to the 3d model vertices, I want to find the projection matrix that will project the vertices exactly on the corresponding object in the image. I use this projection matrix:

2 * fx / w,       0,           1-2*cx/w,         0,
 0,           -2 * fy / h,     -(1-2*cy/h),         0,
 0,                 0,         (f + n) / (n - f), (2 * f * n) / (n - f),
 0,                 0,               -1,             0

w is the width of the image, h is the height, f is the far clipping plane and n the near clipping plane. From what I found, we ignore clipping planes when using real cameras so we can write the projection matrix as:

2 * fx / w,       0,           1-2*cx/w,         0,
 0,           -2 * fy / h,     -(1-2*cy/h),         0,
 0,                 0,          -1,              0,
 0,                 0,          -1,              0

After applying the projection matrix on a 3D point, I want to convert x and y to pixel coordinates. To do this, I do the following. Let p be a point of the 3d model in homogeneous coordinates after applying model and projection transform:

float x=p.x/p.w; 
float y=p.y/p.w;
// x and y are now in the range [-1,1]
x=(x+1)*(w/2);
y=(y+1)*(h/2);
// x and y are now in pixel coordinates.

Even though I'm very close, you can see that the result is not correct:

Where is the mistake?

Solution

You are using a rather weird projection method. The standard one is:

# python, numpy
K = np.array([[fx 0 cx], [0, fy, cy], [0, 0, 1]])
# xyz is a 3d point in camera coordinates
xyz = getMyXYZ()
# project into homogeneous image coordinates
uvw = K.dot(xyz)
# pixel coordinates
uv = uvw[:2] / uvw[2]

The above assumes that:

There is no lens distortion.
The camera coordinate frame has Z going out of the (cx, cy) image pixel toward the scene, X going toward the right (parallel to an image row), and Y going down, with the origin fx pixels away behind the image.
The image coordinate frame's origin is at the center of the top-left pixel, with the x axis increasing toward the right and y going down.