I have a pyopencl
based code which runs perfectly fine for 3-dimensional work groups, but when moving to 4-dimensional work groups, it breaks down with the error:
pyopencl._cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_WORK_DIMENSION
Digging around, I found this answer to another question, which implies that OpenCl
in fact allows higher dimensional work groups.
So my question is if it is possible to change this setting in pyopencl
. From this other answer elsewhere, I understand that pyopencl
immediately inputs the dimensions, but given the error I have, I think there must be some issue.
This is a minimal sample code to replicate this error. The code works well for the first kernel function, it breaks down on the second one.
import pyopencl as cl
import numpy as np
context = cl.create_some_context()
queue = cl.CommandQueue(context)
kernel_code = """
__kernel void fun3d( __global double *output)
{
size_t i = get_global_id(0);
size_t j = get_global_id(1);
size_t k = get_global_id(2);
size_t I = get_global_size(0);
size_t J = get_global_size(1);
#
size_t idx = k*J*I + j*I + i;
#
output[idx] = idx;
}
__kernel void fun4d( __global double *output)
{
size_t i = get_global_id(0);
size_t j = get_global_id(1);
size_t k = get_global_id(2);
size_t l = get_global_id(3);
size_t I = get_global_size(0);
size_t J = get_global_size(1);
size_t K = get_global_size(2);
#
size_t idx = l*K*J*I + k*J*I + j*I + i;
#
output[idx] = idx;
}
"""
program = cl.Program(context, kernel_code).build()
I = 2
J = 3
K = 4
L = 5
output3d = np.zeros((I*J*K)).astype(np.float64)
cl_output3d = cl.Buffer(context, cl.mem_flags.WRITE_ONLY, output3d.nbytes)
program.fun3d(queue, (I,J,K), None, cl_output3d)
cl.enqueue_copy(queue, output3d, cl_output3d)
queue.finish()
import code; code.interact(local=dict(globals(), **locals()))
# 4d attempt
output4d = np.zeros((I*J*K*L)).astype(np.float64)
cl_output4d = cl.Buffer(context, cl.mem_flags.WRITE_ONLY, output4d.nbytes)
program.fun4d(queue, (I,J,K,L), None, cl_output4d)
cl.enqueue_copy(queue, output4d, cl_output4d)
queue.finish()
Trying to specify more dimensions than supported by implementation is not going to work.
The maximum number of supported dimensions can be queried via CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS
or in terminal, for example:
$ clinfo | grep dim
Max work item dimensions 3