ccudaclangdynamic-parallelism

Does Clang support dynamic parallelism in cuda?


Dynamic parallelism means kernels calls kernels. Its possible to compile CUDA program using clang, but do clang support dynamic parallelism ?
I am getting this error when attempting to compile a CUDA program which has dynamic parallelism.

input.cc:222:5: error: reference to __global__ function 'TargetKernel' in __global__ function
  222 |     TargetKernel<<<32,45>>>(2,2+6);

Is there any specific flag I need to use to enable dynamic parallelism support in clang? I have tried -red=true, --cuda-gpu-arch=sm_86, with no success. I am using clang 20 by the way.

A simple example code for dynamic parallelism is below:

__global__ void ChildKernel() {
    printf("Hello from ChildKernel!\n");
}

__global__ void ParentKernel() {
    printf("Hello from ParentKernel!\n");
    ChildKernel<<<1, 1>>>();
}

int main() {
    ParentKernel<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

Solution

  • As @JérômeRichard mentioned in a comment above:

    1. According to the llvm/clang forum:

      Dynamic parallelism is not implemented in clang.
      ...
      I’ve seen virtually no demand for this feature, so it’s not a high priority.

      This is from Feb. 2023.

    2. There is no commit in the llvm/clang git that seems to address the issue since then.

    Bottom line:
    As of now, clang does not support CUDA dynamic parallelism.