sizeopenclworkgroup

Getting CL_INVALID_WORKGROUP_SIZE on matrix operations


I am passing in a matrix as global memory and processing each vector(row) in local memory. The actual matrix passed in is 100 X 2025, but in the kernel I pad it with zeros to utilize power of 2 operations. I process 4 elements of the vector in each work item.

MAX_WORK_ITEM_SIZES: (512,512,512) MAX_WORK_GROUP_SIZE: 512

size_t globalWorkSize[2] = { 100, 2048 };
size_t localWorkSize[1] = { 512 };

I've also tried making localWorkSize 2 dimensional: {1, 512} but I get the same error, CL_INVALID_WORKGROUP_SIZE on this function call:

err = clEnqueueNDRangeKernel( openCLObjects.queue, openCLObjects.Normalize, 2, NULL,
                    globalWorkSize, localWorkSize, 0, NULL, NULL );

Any idea what could be going wrong?

Thanks.


Solution

  • Device properties: (Generic upper limit for a device)

    Kernel properties: (Specific limit for a device-kernel compiled)


    The firs one is hardcoded for each device and is probably limited by how many items can be addressed in full SIMD mode.

    The second limit is per kernel, and is what you should use instead. This one takes into account more things specific to your code. Like maximum private memory, etc...

    Do you meet the second requirement as well?

    BTW: You should always use in any case:

    size_t globalWorkSize[2] = { 100, 2048 };
    size_t localWorkSize[2] = { 1, 512 };