I have a kernel I'm running on an NVIDIA GPU, which uses the FP16 type __half, provided by cuda_fp16.hpp
. To check something about its behavior, I also want to manipulate such __half
values on the CPU. However, when I try to apply arithmetic operations to them in host-side code, I get an error, since they're defined like so:
__device__ __forceinline__ __half operator+(const __half &lh, const __half &rh) { return __hadd(lh, rh); }
__device__ __forceinline__ __half operator-(const __half &lh, const __half &rh) { return __hsub(lh, rh); }
__device__ __forceinline__ __half operator*(const __half &lh, const __half &rh) { return __hmul(lh, rh); }
__device__ __forceinline__ __half operator/(const __half &lh, const __half &rh) { return __hdiv(lh, rh); }
how can I, therefore, use these values on the host (Other than by upcasting and downcasting every time, I mean)?
Note: Using CUDA 11.2.
As @RobertCrovella mentions in a comment - newer versions of CUDA (e.g. 12.2) also offer a host-side versions of these functions.
Alternatively, C++23 offers a proper 16-bit floating-point type, std::float16_t
, which you can apply arithmetic operations to; and std::bfloat16_t
is also available. See:
Fixed width floating-point types (since C++23)
on cppreference.com.
For older C++ standard versions, you can use the available compiler-specific half-precision floating point types and builtins, such as GCC's, to perform the computation.