openclpyopencl

Is it possible to run a 4 dimensional work item in pyopencl?


I have a pyopencl based code which runs perfectly fine for 3-dimensional work groups, but when moving to 4-dimensional work groups, it breaks down with the error:

pyopencl._cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_WORK_DIMENSION

Digging around, I found this answer to another question, which implies that OpenCl in fact allows higher dimensional work groups.

So my question is if it is possible to change this setting in pyopencl. From this other answer elsewhere, I understand that pyopencl immediately inputs the dimensions, but given the error I have, I think there must be some issue.

This is a minimal sample code to replicate this error. The code works well for the first kernel function, it breaks down on the second one.


import pyopencl as cl
import numpy as np

context = cl.create_some_context() 
queue = cl.CommandQueue(context)  


kernel_code = """
__kernel void fun3d( __global double *output)
{
    size_t i = get_global_id(0);
    size_t j = get_global_id(1);
    size_t k = get_global_id(2);
    size_t I = get_global_size(0);
    size_t J = get_global_size(1);
    #
    size_t idx = k*J*I + j*I + i;
    # 
    output[idx] = idx;
}

__kernel void fun4d( __global double *output)
{
    size_t i = get_global_id(0);
    size_t j = get_global_id(1);
    size_t k = get_global_id(2);
    size_t l = get_global_id(3);
    size_t I = get_global_size(0);
    size_t J = get_global_size(1);
    size_t K = get_global_size(2);
    #
    size_t idx = l*K*J*I + k*J*I + j*I + i;
    # 
    output[idx] = idx;
}
"""

program = cl.Program(context, kernel_code).build()

I = 2
J = 3
K = 4
L = 5

output3d = np.zeros((I*J*K)).astype(np.float64)
cl_output3d = cl.Buffer(context, cl.mem_flags.WRITE_ONLY, output3d.nbytes)

program.fun3d(queue, (I,J,K), None, cl_output3d)
cl.enqueue_copy(queue, output3d, cl_output3d)
queue.finish()


import code; code.interact(local=dict(globals(), **locals()))
# 4d attempt

output4d = np.zeros((I*J*K*L)).astype(np.float64)
cl_output4d = cl.Buffer(context, cl.mem_flags.WRITE_ONLY, output4d.nbytes)

program.fun4d(queue, (I,J,K,L), None, cl_output4d)
cl.enqueue_copy(queue, output4d, cl_output4d)
queue.finish()

Solution

  • Trying to specify more dimensions than supported by implementation is not going to work.

    The maximum number of supported dimensions can be queried via CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS or in terminal, for example:

    $ clinfo | grep dim
    Max work item dimensions 3