3ddirectx-11direct3dcoordinate-transformationmouse-picking

What is the relation between View Space & NDC?


I want to transform a picking ray from screen space coordinates to view space, for picking purposes, in DirectX 11.

Below is a part (from "Introduction to 3D Game Programming with DirectX 11" by Frank D. Luna) explaining this transform.

I don't understand the part in red: As far as I know, we multiply vertices by the projection matrix to transform them from the view space to homogeneous clip space. Then the hardware does the perspective division, transforming into NDC space. So how can we reverse this transform just by multiplying the x coordinate with the aspect ratio r?

Generally XMVector3Unproject() is used to achieve this transformation, which reverses the viewport transform, then multiplies by ( inverse projection * inverse view * inverse world ) matrix.

Can someone explain how (WHY?) this other "method" using aspect ratio works?

Excerpt explaining screen space to view space transformation

EDİT: I've added the referenced 5.6.3.3 part below:

enter image description here


Solution

  • I don't understand the part in red: As far as I know, we multiply vertices by the projection matrix to transform them from the view space to homogeneous clip space. Then the hardware does the perspective division, transforming into NDC space. So how can we reverse this transform just by multiplying the x coordinate with the aspect ratio r?

    This should work only for points that are in the image plane in the first place (in view space), because these points do not change their x and y coordinates when being projected APART from their aspect scaling in x direction.

    You can think of view space as a (viewing)frustum located at the camera center (C). The image plane intersects this viewing frustum at some distance from the camera center C (namely at zNear distance). When doing perspective projection things that are closer to C than zNear are scaled to be bigger on screen and things "behind" the image plane are downscaled (this is the perspective distortion). This is technically achieved by the division by w in homogeneous coordinates. The key is that points in the image plane are not scaled. You can imagine a frustum morphing into a cube, with the image plane remaining the same size being intersection of an infinitely large plane with the frustum as well as with the cube.

    Now, after imagining frustum viewspace --> cube, the only thing that needs to be done is to apply the aspect ratio to match the (NDC-) cube's x-,y-coordinates to the screen rectangle. This is done by keeping y and only dividing x by r. And THIS is the step that is undone by the taking NDC coords and multiply by r. But this brings you only from rectangle to a square image plane in NDC coordinates (a cross section of the NDC-cube). The projection is NOT undone by this.

    The trick is that this image plane cross section by design equals the cross section in view space as i described with my imagined morphing. So you could technically say that your point (x_v y_v) is in view space coordinates again - although you are ONLY always in the image plane. The reason to speak of view space (and its a good one) is that you can now shoot a ray from C through your (x_v, y_v) and your original 3d-object-point is on this ray. Only its distance z is not known. You could get this distance from a depthbuffer lookup for example, which is what XMVector3Unproject may be doing (i guess).