gpuglslprecisionvulkanfloat32

Do bfloat types have any benefits over fp16 provided by VK_KHR_16bit_storage extension?


In vulkan api there are two extensions to use 16-bit types, namely VK_KHR_16bit_storage and VK_KHR_shader_float16_int8. So, if the hardware supports these extensions, fp16 type variables can be declared providing performance benefits.

Recently I've read an article link to article about using bfloat types variables in shaders. In the article, they modified the shader compiler and during the compilation they convert the fp32 variables to bfloat type.

My question is, does using bfloat types in a graphics pipeline has any kinds of benefits over a simple fp16 provided by the mentioned vulkan extensions above? One benefit, I see is that these extensions depend on the gpu itself. However, if the GPU supports bfloat, the compiler can convert the fp32 variables.

Thanks in advance!


Solution

  • The efficiency of computation between bfloat and regular 16-bit IEEE-754 floats are fairly similar, for hardware which supports bfloats. And bfloats are 16-bits in size, so neither has faster memory access times than the other.

    The primary performance advantage of bfloat compared to regular 16-bit IEEE-754 floats is that the conversion between bfloats and 32-bit floats is very trivial. You just truncate 16-bits out of the 32-bit float and now you have the equivalent bfloat. 16-bit IEEE-754 float conversions are a bit more complex. If you're doing a lot of such conversions, this can be significant.

    However, the article you linked to was not about explicit usage of blfoats in a shader. It was about taking code that used 32-bit floats and changing it to use bfloat behind its back. That is, the conversion is happening without the original shader author's request.

    Doing this with regular 16-bit IEEE-754 floats is very dangerous. The reason is that those floats have a massively abbreviated range of values. If I write code that's expecting the full 32-bit float range, a conversion to 16-bit could very easily start spitting out INFs that break computations.

    However, bfloats have almost the same range as 32-bit floats. They have vastly lower precision, but their ranges are nearly identical. In terms of human-detectable visual breakage, code tends to be more tolerant of lower precision results than it is of insufficient range.

    This makes this kind of behind-the-back code conversion less likely to produce unacceptable results.