cudadynamic-parallelismgpu-cooperative-groups

Why is cudaLaunchCooperativeKernel() returning not permitted?


So I am using GTX 1050 with a compute capability of 6.1 with CUDA 11.0. I need to use grid synchronization in my program so cudaLaunchCooperativeKernel() is needed. I have checked my device query so the GPU does have support for cooperative groups. I am unable to execute the following function

 extern "C" __global__ void test(int x) {
    if (x) {
       printf("%d", x);
       if (threadIdx.x == 0)
          test<<<1, 1>>>(--x);
    }
}

After calling,

cudaLaunchCooperativeKernel((void *)test, 1, 1, (void **) (&x));

getting an error 'operation not permitted' (code is 800). Now, this is returned when the device has no support of cooperative groups (Not in this case). So, what could cause this problem?


Solution

  • Your kernel makes use of dynamic parallelism. However, dynamic parallelism is not allowed in kernels which are launched via cudaLaunchCooperativeKernel

    This is mentioned in the documentation of the runtime API. https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION.html