graphicsglslvulkan

Why does early fragment tests need to be specified in shader if I write to a storage buffer?


In my Vulkan shader I get around 500 frames per second. Now if I write to a storage buffer the frame rate drops to 200 fps. I discovered that what it's doing is disabling the early fragment tests. I know this because if I place:

layout(early_fragment_tests) in; // Forces early depth tests

At the top of the file the frame rate goes back up to 500 fps.

I'm wondering, is this normal? So early fragment tests seem to be enabled, and then upon writing to a storage buffer it disabled it automatically, and I have to enable it manually with that line, or "force" it, so to speak.

Really interestingly writing to gl_FragDepth.z didn't have the same effect. So writing to a storage buffer disabled the early fragment tests automatically but writing to gl_FragDepth.z didn't, which is strange, because writing to gl_FragDepth.z is the one that I thought was supposed to disable the early fragment tests.


Solution

  • The short answer is that yes, this is expected. To understand why, let's dive into what's actually happening.

    Graphics APIs are specified to process fragment shading in application primitive order. This ordering guarantee is needed to ensure that there is a sensible programmer's model for order-dependent things such as blending, or other side-effects such as writes to memory.

    Graphics APIs are also specified to do ZS testing after fragment shading (i.e. late ZS). Doing it after fragment shading is the only point in the pipeline where we can guarantee that we can do it correctly, because the fragment processing might change the depth value or have some other user-visible side effect that we need to process.

    The entire concept of early ZS testing is an optimization, allowing hardware to completely skip running fragments. However, the implementation can only automatically use an early test in the subset of cases where it can unambiguously prove that running the fragment is not necessary, and killing it does not change the application-visible behaviour when compared to doing a late test.

    In your first case, you cannot use early ZS because the shader must run enough to create the gl_FragDepth.z value needed by the ZS testing.

    In your second case, you cannot use early ZS because you have a user-visible side-effect writing to memory outside of the framebuffer. This must get written to memory because the programmer's model says that ZS testing happens after fragment shading, and skipping the write would change the application-visible behavior.

    Framebuffer writes are very special in enabling early ZS testing by default, unless the shader modifies its own depth value, because there are strict rules about how the framebuffer is used and when the values in it are visible to the application. Generic memory writes to storage buffers, images, or atomics, don't give any of the necessary guarantees to allow early ZS by default.

    In both cases specifying layout(early_fragment_tests) is a way of the application programmer providing a promise to the implementation that it is algorithmically safe to kill the fragments, so you explicitly allow the change in application-visible behavior.

    It's worth noting that this type of "optimizations can't change the programmer's model" logic also applies to other types of fragment optimization, such as vendor-specific hidden surface removal algorithms. Memory side-effects outside of the framebuffer tend to disable most of these too ...