c++openglmatrixprojectionglm-math

Screen Coordinates to World Coordinates


I want to convert from Screen coordinates to world coordinates in OpenGL. I am using glm for that purpose (also I am using glfw)

This is my code:

static void mouse_callback(GLFWwindow* window, int button, int action, int mods)
{
    if (button == GLFW_MOUSE_BUTTON_LEFT) {
        if(GLFW_PRESS == action){
            int height = 768, width =1024; 
            double xpos,ypos,zpos;
            glfwGetCursorPos(window, &xpos, &ypos);

            glReadPixels(xpos, ypos, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &zpos);

            glm::mat4 m_projection = glm::perspective(glm::radians(45.0f), (float)(1024/768), 0.1f, 1000.0f);

            glm::vec3 win(xpos,height - ypos, zpos);
            glm::vec4 viewport(0.0f,0.0f,(float)width, (float)height);
            glm::vec3 world = glm::unProject(win, mesh.getView() * mesh.getTransform(),m_projection,viewport);

            std::cout << "screen " << xpos << " " << ypos << " " << zpos << std::endl;
            std::cout << "world " << world.x << " " << world.y << " " << world.z << std::endl;
        }
    }
}

Now, I have 2 problem, the first is that the world vector that I get from glm::unProject has a very small x, y and z. If i use this values to translate the mesh, the mesh suffers a small translate and doesn't follow the mouse pointer.

The second problem is, that as said in the glm docs (https://glm.g-truc.net/0.9.8/api/a00169.html#ga82a558de3ce42cbeed0f6ec292a4e1b3) the result is returned in object coordinates. So in order to convert screen to world coordinates I should use a transform matrix from one mesh, but what happens if a have many meshes and i want to convert from screen to world coordinates? what model matrix should I multuply by camera view matrix to form ModelView matrix?


Solution

  • There are a couple of issues with this sequence:

           glfwGetCursorPos(window, &xpos, &ypos);
           glReadPixels(xpos, ypos, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &zpos);
           [...]
           glm::vec3 win(xpos,height - ypos, zpos);
    
    1. Window space origin. glReadPixels is a GL function, and as such adheres to GL's conventions, with the origin beeing the lower left pixel. While you flip to that convention for your win variable, you do still use the wrong origin for reading the depth buffer.

    Furthermore, your flipping is wrong. Since ypos should be in [0,height-1], the correct formula is height-1 - ypos, so you are also off by one here. (We will see later that that isn't exactly true either.)

    1. "Screen Coordinates" vs. Pixel Coordinates. Your code assumes that the coordinates you get back from GLFW are in pixels. This is not the case. GLFW uses the concept of "virtual screen coordinates" which don't necessarily map to pixels:

    Pixels and screen coordinates may map 1:1 on your machine, but they won't on every other machine, for example on a Mac with a Retina display. The ratio between screen coordinates and pixels may also change at run-time depending on which monitor the window is currently considered to be on.

    GLFW generally provides two sizes for a window, glfwGetWindowSize will return the result in said virtual screen coordinates, while glfwGetFramebufferSize will return the actual size in pixels, relevant for OpenGL. So basically, you must query both sizes, and than can appropriately scale the mouse coords from screen coords to the actual pixels you need.

    1. Sub-Pixel position. While glReadPixels addresses a specific pixel with integer coordinates, the whole transformation math works with floating point and can represent arbitrary sub-pixel positions. GL's window space is defined so that integer coordinates represent the corners of the pixels, the pixel centers lie at half integer coordinates. Your win variable will represent the lower left corner of said pixel, but the more useful convention would be to use the pixel center, so you'd better add an offset of (0.5f, 0.5f, 0.0f) to win, assuming you point to the pixel center. (We can do a bit better if the virtual screen coords are higher resolution than our pixels, which means we already get a sub-pixel position for the mouse cursor, but the math won't change, because we have still to switch to the GL's convent where integer means border instead of integer means center). Note that since we now consider a space which is going from [0,w) in x and [0,h) in y, this also affects point 1. If you click at pixel (0,0), it will have the center (0.5, 0.5), and the y flipping should be h-y so h-0.5 (which should be rounded down towards h-1 when accessing the framebuffer pixel).

    To put it all together, you could do (conceptually):

    glfwGetWindowSize(win, &screen_w, &screen_h); // better use the callback and cache the values 
    glfwGetFramebufferSize(win, &pixel_w, &pixel_h); // better use the callback and cache the values 
    glfwGetCursorPos(window, &xpos, &ypos);
    glm::vec2 screen_pos=glm::vec2(xpos, ypos);
    glm::vec2 pixel_pos=screen_pos * glm::vec2(pixel_w, pixel_h) / glm::vec2(screen_w, screen_h); // note: not necessarily integer
    pixel_pos = pixel_pos + glm::vec2(0.5f, 0.5f); // shift to GL's center convention
    glm::vec3 win=glm::vec3(pixel_pos.x, pixel_h-pixel_pos.y, 0.0f);
    glReadPixels( (GLint)win.x, (GLint)win.y, ..., &win.z)
    // ... unproject win
    

    what model matrix should I multuply by camera view matrix to form ModelView matrix?

    None. The basic coordinate transformation pipeline is

    object space -> {MODEL} -> World Space -> {VIEW} -> Eye Space -> {PROJ} -> Clip Space -> {perspective divide} -> NDC -> {Viewport/DepthRange} -> Window Space
    

    There is no model matrix influencing the way from world to window space, hence inverting it will also not depend on any model matrix either.

    that as said in the glm docs (https://glm.g-truc.net/0.9.8/api/a00169.html#ga82a558de3ce42cbeed0f6ec292a4e1b3) the result is returned in object coordinates.

    The math doesn't care about which spaces you transform between. The documentation mentions object space, and the function uses an argument named modelView, but what matrix you put there is totally irrelevant. Putting just view there will be fine.

    So in order to convert screen to world coordinates I should use a transform matrix from one mesh.

    Well, you could even do that. You could use any model matrix of any object, as long as the matrix isn't singular, and as long as you use the same matrix for the unproject as you later use for going from object space to world space. You can even make up a random matrix, if you make sure it is regular. (Well, there might be numerical issues if the matrix is ill-conditioned). The key thing here is that when you specify (V*M) and P as the matrices for glm::unproject, it will internally calculate (V*M)^-1 * P^-1 * ndc_pos which is M^-1 * V^-1 & P^-1 * ndc_pos. If you transform the result back from object space to world space, you multiply that by M again, resulting in M * M^-1 * V^-1 & P^-1 * ndc_pos, which is of course just V^-1 & P^-1 * ndc_pos which you would directly have gotten if you didn't put M into the unproject in the first place. You just added more computational work, and introduced more potential for numerical issues...