[SOLVED] Looking for examples for `atomic_fetch

Looking for examples for `atomic_fetch_add` for float32 in OpenCL 3.0

It appears that OpenCL 3.0 had added support to the long-waited atomic operations for floating point numbers, however, after spending hours, I still can't find a single example showing how to use such functions.

I've already been using a common hack to achieve float32 atomic_add, but I wanted to try OpenCL 3's built-in support, I tried defining a macro to call atomic_fetch_add, like below

#if __OPENCL_C_VERSION__ >= CL_VERSION_3_0
  #pragma OPENCL EXTENSION cl_ext_float_atomics : enable
  #define atomicadd(a,b) atomic_fetch_add((volatile atomic_float *)(a),(b)) 
#else
  inline float atomicadd(volatile __global float* address, const float value) {
    float old = value, orig;
    while ((old = atomic_xchg(address, (orig = atomic_xchg(address, 0.0f)) + old)) != 0.0f);
    return orig;
  }
#endif

but I am getting tons of errors:

<kernel>:320:26: warning: unknown OpenCL extension 'cl_ext_float_atomics' - ignoring
#pragma OPENCL EXTENSION cl_ext_float_atomics : enable
                         ^
<kernel>:773:17: error: no matching function for call to 'atomic_fetch_add'
                atomicadd(& field[*idx1d + tshift * gcfg->dimlen.z], -p[0].w);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<kernel>:321:24: note: expanded from macro 'atomicadd'
#define atomicadd(a,b) atomic_fetch_add((volatile atomic_float *)(a),(b)) 
                       ^~~~~~~~~~~~~~~~
cl_kernel.h:4571:1: note: candidate function not viable: no known conversion from 'volatile atomic_float *' to 'volatile atomic_int *__attribute__((address_space(16776963)))' for 1st argument
DECL_ATOMIC_FETCH_MOD(atomic_int, int, int)
^
cl_kernel.h:4563:3: note: expanded from macro 'DECL_ATOMIC_FETCH_MOD'
  DECL_ATOMIC_FETCH_MOD_OP(add, A, C, M) \
  ^
...

where field[] is a global memory float buffer. My computer has 2x GTX 2080 with driver 515.x. clinfo reports that both devices support OpenCL 3.0

what is the right way to call atomic_fetch_add with float type?

Solution

Making my initial comment the answer here:

Nvidia GPUs still only support the OpenCL C 1.2 language standard, as can be queried with cl_device.getInfo<CL_DEVICE_OPENCL_C_VERSION>(). The Platform version is reported as 3.0, but the features are still unchanged from 1.2, especially the recent cl_ext_float_atomics extension is not yet supported.

In theory you could make a switch in code between the usual atomics_add_f workaround and the inline PTX version based on if the device vendor is reported as "Nvidia", or based on if some common nv_... extensions are available.

However this is still not the elegant universally compatible solution that cl_ext_float_atomics promises. It's a very desired feature and I hope the vendors will implement it soon.