parallel-processingcudaopencl

How to use 128bit float and complex numbers in OpenCL/CUDA?


I need to use 128 bit floating point numbers and complex numbers in parallel GPU computing using OpenCL or CUDA.
Are there any ways to achieve this without implementing it yourself?

I looked at the OpenGL and CUDA specifications and found no float128 support there, is it really impossible for me to use float128 in them? I tried to look for any libraries, but it seems that they do not exist, is that so?

At least I would like to be able to use float128, is it possible to achieve this?


Solution

  • None of the modern GPUs/CPUs support FP128. The GPUs only have circuitry for FP32 and very limited to no support for FP64. Neither OpenCL nor CUDA support FP128.

    You have to implement the format yourself, conversion and arithmetic emulated as a struct of 2 64-bit integers. Same for complex numbers.

    I have super fast FP16<->FP32 conversion algorithms here, they are adaptable to FP64<->FP128. For the arithmetic you have to find a solution.

    What do you even need 34 decimal digits for? Is there a way to get the same done with FP64? Quite often you can do numeric trickery to avoid digit extinction with lower precision formats.

    Also, have a look at Posit formats. If your application only does arithmetic close to the number 1, 64-bit Posit is way more capable than FP64 and could be good enough.