copenclopencl-copencl.net

OpenCL 2.0 full profile, without atomic_store & atomic_load? Is this possible?


I use the OpenCL.NET C# wrapper for OpenCL.

My GPU from GPU-Z is AMD Radeon Barcelo, and specific for OpenCL:

Part of the code:

// probably useless
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_global_int32_extended_atomics : enable
#pragma OPENCL EXTENSION cl_khr_fp64 : enable

void vector_is_zero_partial(
    uint row,
    uint row_to,
    __global const double *x,
    double tolerance,
    __global atomic_int *is_zero)
{
    for (; row < row_to; ++row)
    {
        if (fabs(x[row]) > tolerance)
        {
            atomic_store(is_zero, 0);
            break;
        }
        if (!atomic_load(is_zero)) break;
    }
}

The error:

C:\Users\CHAMEL~1\AppData\Local\Temp\\OCL8036T0.cl:264:4: error: implicit declaration of function 'atomic_store' is invalid in C99
                        atomic_store(is_zero, 0);
                        ^
C:\Users\CHAMEL~1\AppData\Local\Temp\\OCL8036T0.cl:267:8: error: implicit declaration of function 'atomic_load' is invalid in C99
                if (!atomic_load(is_zero)) break;
                     ^
2 errors generated.

error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
 

So, atomic extensions exist, OpenCL is v2, BUT atomic_store / atomic_load does not exist.

Did I something wrong here?


Solution

  • atomic_load requires the device features __opencl_c_atomic_order_seq_cst and __opencl_c_atomic_scope_device. As you are on the AMD-APP platform, it is possible these are not available. You could check clinfo to be sure.

    Two options that could be considered are:

    1. Try mesa drivers with rusticl. As your GPU is GCN, this should get you OpenCL 3.0 support. As you are likely on windows, you would probably need to use WSL to get it to work.
    2. Change your function to use the legacy OpenCL atomics. Something like this should work:
    void vector_is_zero_partial(
        uint row,
        uint row_to,
        __global const double *x,
        double tolerance,
        __global atomic_int *is_zero)
    {
        for (; row < row_to; ++row)
        {
            if (fabs(x[row]) > tolerance)
            {
                atomic_xchg(is_zero, 0);//stores 0 to is_zero, returns is_zero
                break;
            }
            if (!atomic_max(is_zero, 0)) break;
            /*atomic_max will return is_zero, and store the
            max value of is_zero, 0 to is_zero.
            If is_zero = 1, is_zero will remain 1, but if 
            is_zero has been set to 0, it will remain 0.*/
        }
    }