c++openglglslshader

How many times vertex-shader , geometry-shader and fragment-shader separately are called to draw one frame?


In order to draw one frame , or to say given an array of vertices , while this array being transmitting and passing through stages of a shading program , is there a strict correspondence between previous and next stages ?

For example . Given 3 vertices (0,1,0) , (1,1,0) , (1,0,0), vertex-shader would execute 3 times if it is layout (location = 0) in vec3 aPos; written at the top of the shader . Because vertex shader would execute for each vertex (it is vertex-attribute who determines how many vertices are there, so even if we have 6 variables of vector3 type at the beginning , VS would still execute 3 times if half of variables are considered as normal of one vertex)

I have no idea how many times GS would execute , as you can have more vertices generated after GS compared with its input . And what about FS ?

I know there is rasterization before FS . It projects analog signal (∈R. The fidelity is limited only by float size) unto digital signal (our screen , limited by number of pixels) . Is there correspondence between a triangle and how many times FS would execute on that triangle , like that given a triangle (3 vectors) , the program will rasterize it and FS would execute number of times equal to how many pixels is there in the covered area)?

Or does FS execute in total number of times equal to the pixels count of screen size , like covering wall with a roller ?

edit: I guess GS is executed per primitives . As you have an array of struct representing multiple vertices in one primitives.


Solution

  • The vertex shader runs as many times as you specify as the last parameter of glDrawArrays (the parameter called count).

    The geometry shader runs as many times as primitives (points, lines, triangles) you have. However, keep in mind, that tessellation shaders (if you have them) usually increase the number of primitives.

    The fragment shader in theory runs roughly as many times as the number of pixels covered by your primitives. However, this is a bit more complicated, because many factors can influence this. If you enable the depth test or the stencil test, the GPU (most likely) won't run the fragment shader on pixels where it failed in the tests. Obviously, the GPU won't rasterize pixels outside of the viewport you set with glViewport. When you use multisampling or conservative rasterization, the fragment shader might run more times. And one more important thing: the GPU can't rasterize a single pixel, it rasterizes in 2x2 units, so that you can use derivatives.

    Let's say you have a vertex buffer with 6 vertices. If you call glDrawArrays with the count of 6, the vertex shader will run 6 times, regardless of the primitive mode you use. If you call it with GL_POINTS, the geometry shader will run 6 times because you draw all vertices individually. If you call it with GL_LINES, the geometry shader will run 3 times, because every line is made up of 2 vertices. If you call it with GL_TRIANGLES, the geometry shader will run 2 times, because every triangle is made up of 3 vertices. The fragment shader depends. For example, if you have a higher resolution screen, it'll run more times, because the same primitive covers more pixels, but again, it depends on a lot of configuration I mentioned previously.

    So it's not easy to say the number of fragment shader invocations, especially for a complex scene. Luckily, you can create a query object in OpenGL, with the parameter GL_SAMPLES_PASSED, and you can make OpenGL count the number of pixels passed in the depth test.

    Since the fragment shader is usually the most performance-heavy shader stage (and usually it runs the most times), it is important to run it as few times as possible. For example, some game engines order meshes by the distance from the camera and first render closer meshes, because meshes closer to the camera might cover meshes farther away. But there are a lot of other methods to lower the number of fragment shader invocations, like occlusion culling, dynamic resolution scaling, checkerboard rendering, image upscaling (like DLSS, FSR, or XeSS), etc.