opencldynamic-parallelism

CL_OUT_OF_RESOURCES error is returned by clEnqueueNDRangeKernel() with dynamic parallelism


Kernel codes that produce the error:

__kernel void testDynamic(__global int *data)
{
    int id=get_global_id(0);
    atomic_add(&data[1],2);
}

__kernel void test(__global int * data)
{
    int id=get_global_id(0);
    atomic_add(&data[0],2);
    if (id == 0) {
        queue_t q = get_default_queue();
        ndrange_t ndrange = ndrange_1D(1,1);
        void (^my_block_A)(void) = ^{testDynamic(data);};
        enqueue_kernel(q, CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
                       ndrange,
                       my_block_A);
    }

}

I tested below code to be sure OpenCL 2.0 compiler is working.

__kernel void test2(__global int *data)
{
    int id=get_global_id(0);
    data[id]=work_group_scan_inclusive_add(id);
}

scan function gives 0,1,3,6 as outputs so OpenCL 2.0 reduction functions are working.

Is dynamic parallelism an extension to OpenCL 2.0? If I remove enqueue_kernel command, results are equal the the expected values(omitting child kernel).

Device: Amd RX550, driver: 17.6.2

Is there a special command that needs to be run on host side, to run child kernel on get_default_queue queue? For now, command queue is created with an OpenCL 1.2 way as below:

commandQueue = cl::CommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &err);

Does get_default_queue() have to be the same command queue which calls the parent kernel? Asking this because I'm using same command queue to upload data to GPU and then download results, in a single synchronization.


Solution

  • Moved solution from question to answer:

    Edit: below API command was the solution:

    commandQueue = cl::CommandQueue(context, device,
      CL_QUEUE_ON_DEVICE|
      CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | 
      CL_QUEUE_ON_DEVICE_DEFAULT, &err);
    

    after creating this queue(only 1 per device), didn't use it for anything else and also the parent kernel is enqueued on any other host queue so it looks like get_default_queue() doesn't have to be the parent-calling queue.

    Documentation says CL_INVALID_QUEUE_PROPERTIES will be thrown if CL_QUEUE_ON_DEVICE is specified but for my machine, dynamic parallelism works with it and doesn't throw that error(as the upper commandQueue constructor parameters).