c++clanginline-assemblyhip

HIP-Clang Inline Assembly


What is the Hip-Clang equivalent of this CUDA function?

__device__ __forceinline__ uint32_t add_cc(uint32_t a, uint32_t b)
{
   uint32_t r;
   asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b));
   return r;
}

I'm porting a CUDA project to HIP-Clang that contains inline PTX assembly. The function is used to implement multi-precision addition in the NVIDIA GPU. I tried:

asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid operand for instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=v"(r) : "v"(a), "v"(b)); //operands are not valid for this GPU or mode

The target hardware is RX 6800. AMD clang version 14.0.0.
Is RDNA2 the correct instruction set reference?
Is this LLVM user guide to AMDGPU backend an applicable reference?


Solution

  • It turns out the answer is hardware dependent. For my hardware for which the compiler defines __gfx1030__ the correct syntax is

    asm volatile ("v_add_co_u32 %0, vcc_lo, %1, %2;" : "=v"(r) : "v"(a), "v"(b));
    

    For earlier architechtures such as __gfx900__ replace vcc_lo with vcc
    See the discussion on the Rocm Hip Github and this AMD gpu assembly reference.