cudaconstexprtype-punning

std::bit_cast equivalent for CUDA device side code?


I have a couple of "magic"¹ floating point constants, which I want to use bit-exact in CUDA device side computation, in the form of constexpr symbols. On the host side you'd use std::bit_cast<float>(0x........) for that. However NVCC doesn't "like" std::bit_cast in device side code.

In GLSL you'd use intBitsToFloat, however I see no built-in function in the CUDA C++ language extensions that can do this.


1: well, they're not that "magic", basically they're the floating point equivalent of 0.999…·2ⁿ, that is all bits of the mantissa set to 1 with -(n+1) added to exponent "0" (i.e. 0x7E-n-1).


Solution

  • Update 2 - cuda::std::bit_cast is here!

    The newest version of libcu++ has

    Implemented and backported C++20 bit_cast. It is available in all standard modes and constexpr with compiler support

    It is available in the CUDA Toolkit >= 12.8 and from the CCCL repo.


    Update 1 - Why you might not want to use --expt-relaxed-constexpr

    My view of --expt-relaxed-constexpr has changed after finding some funny behavior similar to what is described in this issue in a Nvidia project. I.e. they know about these problems which might be the reason for the flag being deemed experimental.

    While I don't think that usage std::bit_cast in particular in device code is problematic, compiling with this flag could cause accidental usage of other constexpr functions that are less basic and less safe. Also note that the flag does not only allow the usage of constexpr functions at compile time as I previously thought, but also at runtime (i.e. with non-constexpr input) which is the cause of these issues. This was probably fine at the time of introduction as constexpr functions were very restricted but with newer C++ standards more and more functionality became available in constexpr functions that is not available in device code and seem to be simply ignored which is dangerous. With CUDA 12.8 Nvidia has added information regarding this issue to the documentation.


    Initial answer - You could use --expt-relaxed-constexpr

    Given a host compiler that supports it, you can use std::bit_cast in CUDA C++20 device code (i.e. CUDA >=12) to initialize a constexpr variable. You just need to tell nvcc to make it possible by passing --expt-relaxed-constexpr.

    This flag is labeled as an "Experimental flag", but to me it sounds more like "this flag might be removed/renamed in a future release" than a "here be dragons" in terms of its results. It is also already quite old, which gives me some confidence. See the CUDA 8.0 nvcc docs from 2016 (docs for even older versions are not available online as html, so I didn't check further back).

    As constexpr code is evaluated by the compiler on the host independent of the surrounding device context, I would not expect this flag to be some brittle "black magic". It just needs to pass off the evaluation to the host compiler and use the resulting value/object.

    Given all this context I would rather expect the --expt-relaxed-constexpr-behavior to become the default in some future CUDA version than it vanishing without a replacement.

    If you don't need constexpr

    For anyone who needs a non-constexpr version of bit_cast, see Safe equivalent of std::bit_cast in C++11 (just add __device__).