openclreductionjocl

Using a barrier causes a CL_INVALID_WORK_GROUP_SIZE error


If I use a barrier (no matter if CLK_LOCAL_MEM_FENCE or CLK_GLOBAL_MEM_FENCE) in my kernel, it causes a CL_INVALID_WORK_GROUP_SIZE error. The global work size is 512, the local work size is 128, 65536 items have to be computed, the max work group size of my device is 1024, I am using only one dimension. For Java bindings I use JOCL. The kernel is very simple:

kernel void sum(global float *input, global float *output, const int numElements, local float *localCopy
{
    localCopy[get_local_id(0)] = grid[get_global_id(0)];
    barrier(CLK_LOCAL_MEM_FENCE); // or barrier(CLK_GLOBAL_MEM_FENCE)
}

I run the kernel on the Intel(R) Xeon(R) CPU X5570 @ 2.93GHz and can use OpenCL 1.2. The calling method looks like

kernel.putArg(aCLBuffer).putArg(bCLBuffer).putArg(elementCount).putNullArg(localWorkSize);
queue.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize);

But the error is always the same:

[...]can not enqueue 1DRange CLKernel [...] with gwo: null gws: {512} lws: {128} 
cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]

What I am doing wrong?


Solution

  • This is expected behaviour on some OpenCL platforms. For example, on my Apple system, the CPU device has a maximum work-group size of 1024. However, if a kernel has a barrier inside, then the maximum work-group size for that specific kernel is reduced to 1.

    You can query the maximum work-group size for a specific kernel by using the clGetKernelWorkGroupInfo function with the CL_KERNEL_WORK_GROUP_SIZE parameter. The value returned will be no more than the value returned by clGetDeviceInfo and CL_DEVICE_MAX_WORK_GROUP_SIZE, but is allowed to be less (as it is in this case).