c++clangopenmpnvidiaoffloading

OpenMP task reduction with target offloading segfaults when running single threaded


I was using OpenMP with target offload and found that my application segfaults when limiting OpenMP to a single thread. I could boil it down to the following snippet:

#include <omp.h>

int main(){
    int res = 0;

    #pragma omp parallel num_threads(1)
    {
        #pragma omp single
        {
            #pragma omp taskgroup task_reduction(+:res)
            {
                #pragma omp target in_reduction(+:res) nowait
                {
                    res++;
                }
            }
        }
    }
}

Compiled with

clang++ -fopenmp -fopenmp-targets=nvptx64 --offload-arch=sm_61 -O0 main.cpp

using clang 17.0.0 and cuda 12.1, ran on a Ubuntu 22.04 machine with a 12700k/1080Ti.

This segfaults when num_threads is set to 1, but works fine with more than one thread (e.g. num_threads(2)) or when not specifying nowait on the target task, so that it synchronizes at the end of the target region.

From my understanding, this should work just fine even with a single thread.


Solution

  • According to mailing list/github issues, as of writing this, support for in_reduction in clang is incomplete/missing.