c++cudahalf-precision-float

How can I do arithmetic on CUDA's __half type in host side code?


I have a kernel I'm running on an NVIDIA GPU, which uses the FP16 type __half, provided by cuda_fp16.hpp. To check something about its behavior, I also want to manipulate such __half values on the CPU. However, when I try to apply arithmetic operations to them in host-side code, I get an error, since they're defined like so:

__device__ __forceinline__ __half operator+(const __half &lh, const __half &rh) { return __hadd(lh, rh); }
__device__ __forceinline__ __half operator-(const __half &lh, const __half &rh) { return __hsub(lh, rh); }
__device__ __forceinline__ __half operator*(const __half &lh, const __half &rh) { return __hmul(lh, rh); }
__device__ __forceinline__ __half operator/(const __half &lh, const __half &rh) { return __hdiv(lh, rh); }

how can I, therefore, use these values on the host (Other than by upcasting and downcasting every time, I mean)?

Note: Using CUDA 11.2.


Solution

  • As @RobertCrovella mentions in a comment - newer versions of CUDA (e.g. 12.2) also offer a host-side versions of these functions.

    Alternatively, C++23 offers a proper 16-bit floating-point type, std::float16_t, which you can apply arithmetic operations to; and std::bfloat16_t is also available. See:

    Fixed width floating-point types (since C++23)

    on cppreference.com.

    For older C++ standard versions, you can use the available compiler-specific half-precision floating point types and builtins, such as GCC's, to perform the computation.