c++parallel-processingopenmp

OpenMP Program with Nested For-Loop in While-Loop Occasionally Hangs


I'm attempting to parallelize a for-loop within a while-loop using OpenMP and encountered an issue where the program intermittently hangs, especially when the condition variable approaches 1. Below is the simplified code snippet:

#include <omp.h>
#include <stdio.h>

void task(int thread_id, int condition) {
    printf("Hello from thread %d with condition %d\n", thread_id, condition);
}

int main() {
    int condition = 5;
    #pragma omp parallel num_threads(4) shared(condition)
    {
        while(condition) {
            #pragma omp master
            {
                // Update condition in master thread
                condition--;
                #pragma omp flush(condition)
            }
            #pragma omp barrier // Wait for master to update condition
    
            #pragma omp for
            for(int i = 0; i < 4; ++i) {
                task(omp_get_thread_num(), condition);
            }
        }
    }
    return 0;
}

Output example (hangs when condition is 1. Ideally, the program should stop when condition equals 0.):

Hello from thread 0 with condition 4
Hello from thread 1 with condition 4
Hello from thread 2 with condition 4
Hello from thread 3 with condition 4
Hello from thread 3 with condition 3
Hello from thread 1 with condition 3
Hello from thread 0 with condition 3
Hello from thread 2 with condition 3
Hello from thread 2 with condition 2
Hello from thread 3 with condition 2
Hello from thread 1 with condition 2
Hello from thread 0 with condition 2
Hello from thread 1 with condition 1
Hello from thread 3 with condition 1
Hello from thread 0 with condition 1
Hello from thread 2 with condition 1
...

Despite the use of #pragma omp flush(condition) to ensure the condition variable's visibility and #pragma omp barrier to synchronize threads, the program sometimes stalls, seemingly deadlocked, particularly when condition reaches 1. It appears that the while-loop does not exit as expected.

I've also tried adding an additional barrier at the end of the while-loop, but the issue persists. Could this be due to a deadlock or another synchronization issue? Any insights or solutions would be greatly appreciated. Thank you in advance!


Solution

  • Compiling the code like following reports the data race in your code:

    $ clang -O3 -g -fopenmp -fsanitize=thread so-78260058.c
    $ ./a.out
    WARNING: ThreadSanitizer: data race (pid=80729)
      Read of size 4 at 0x7ffee9bf9638 by thread T2:
        #0 main.omp_outlined_debug__ so-78260058.c:12:15 (a.out+0xe60b8)
        #1 main.omp_outlined so-78260058.c:10:5 (a.out+0xe6345)
        #2 __kmp_invoke_microtask <null> (libomp.so+0xbcbd2)
        #3 main so-78260058.c:10:5 (a.out+0xe6058)
    
      Previous write of size 4 at 0x7ffee9bf9638 by main thread:
        #0 main.omp_outlined_debug__ so-78260058.c:16:26 (a.out+0xe6101)
        #1 main.omp_outlined so-78260058.c:10:5 (a.out+0xe6345)
        #2 __kmp_invoke_microtask <null> (libomp.so+0xbcbd2)
        #3 main so-78260058.c:10:5 (a.out+0xe6058)
    
      Location is stack of main thread.
    
      Location is global '??' at 0x7ffee9bdb000 ([stack]+0x1e638)
    
      Thread T2 (tid=80732, running) created by main thread at:
        #0 pthread_create llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1020:234 (a.out+0x5f1bb)
        #1 __kmp_create_worker <null> (libomp.so+0x9c096)
    
    SUMMARY: ThreadSanitizer: data race so-78260058.c:12:15 in main.omp_outlined_debug__
    

    The report is about a read in line 12 (while(condition) and a write in line 16 (condition--;). The column also clearly identifies the condition variable. The minimal change to fix the data race as discussed in the comments is to add a barrier before the master construct the flush is superfluous, and also the barrier after the master construct is not necessary, because of the implicit barrier at the end of the for region.

    Easier to read is the following code:

    #include <omp.h>
    #include <stdio.h>
    
    void task(int thread_id, int condition) {
      printf("Hello from thread %d with condition %d\n", thread_id, condition);
    }
    
    int main() {
      int condition = 5;
      while (condition) {
        condition--;
    
    #pragma omp parallel for num_threads(4)
        for (int i = 0; i < 4; ++i) {
          task(omp_get_thread_num(), condition);
        }
      }
      return 0;
    }
    

    Since most OpenMP implementations maintain a thread pool, the overhead of spawning the parallel regions is not significantly higher than the two necessary barriers. (if you use nested parallelism, check the KMP_HOT_TEAMS_MAX_LEVEL env variable)