It appears that OpenCL 3.0 had added support to the long-waited atomic operations for floating point numbers, however, after spending hours, I still can't find a single example showing how to use such functions.
I've already been using a common hack to achieve float32 atomic_add, but I wanted to try OpenCL 3's built-in support, I tried defining a macro to call atomic_fetch_add
, like below
#if __OPENCL_C_VERSION__ >= CL_VERSION_3_0
#pragma OPENCL EXTENSION cl_ext_float_atomics : enable
#define atomicadd(a,b) atomic_fetch_add((volatile atomic_float *)(a),(b))
#else
inline float atomicadd(volatile __global float* address, const float value) {
float old = value, orig;
while ((old = atomic_xchg(address, (orig = atomic_xchg(address, 0.0f)) + old)) != 0.0f);
return orig;
}
#endif
but I am getting tons of errors:
<kernel>:320:26: warning: unknown OpenCL extension 'cl_ext_float_atomics' - ignoring
#pragma OPENCL EXTENSION cl_ext_float_atomics : enable
^
<kernel>:773:17: error: no matching function for call to 'atomic_fetch_add'
atomicadd(& field[*idx1d + tshift * gcfg->dimlen.z], -p[0].w);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<kernel>:321:24: note: expanded from macro 'atomicadd'
#define atomicadd(a,b) atomic_fetch_add((volatile atomic_float *)(a),(b))
^~~~~~~~~~~~~~~~
cl_kernel.h:4571:1: note: candidate function not viable: no known conversion from 'volatile atomic_float *' to 'volatile atomic_int *__attribute__((address_space(16776963)))' for 1st argument
DECL_ATOMIC_FETCH_MOD(atomic_int, int, int)
^
cl_kernel.h:4563:3: note: expanded from macro 'DECL_ATOMIC_FETCH_MOD'
DECL_ATOMIC_FETCH_MOD_OP(add, A, C, M) \
^
...
where field[]
is a global memory float buffer. My computer has 2x GTX 2080 with driver 515.x. clinfo reports that both devices support OpenCL 3.0
what is the right way to call atomic_fetch_add
with float type?
Making my initial comment the answer here:
Nvidia GPUs still only support the OpenCL C 1.2 language standard, as can be queried with cl_device.getInfo<CL_DEVICE_OPENCL_C_VERSION>(). The Platform version is reported as 3.0, but the features are still unchanged from 1.2, especially the recent cl_ext_float_atomics extension is not yet supported.
In theory you could make a switch in code between the usual atomics_add_f workaround and the inline PTX version based on if the device vendor is reported as "Nvidia", or based on if some common nv_... extensions are available.
However this is still not the elegant universally compatible solution that cl_ext_float_atomics promises. It's a very desired feature and I hope the vendors will implement it soon.