c++openmp

How can OpenMP know how many loop instances are nested?


How can OpenMP know how many loop instances are nested?

Is it explicitly counted by compiler?


Solution

  • The OpenMP runtime keeps track of this information in thread-local variables.

    Probably one of the most popular OpenMP implementations out there, libgomp, is open-source; That means one can read not just its documentation but also its source code entirely free.

    The implementation of omp_get_level() is here:

    int
    omp_get_level (void)
    {
      return gomp_thread ()->ts.level;
    }
    

    The implementation of gomp_thread() is here. It retrieves a pointer to a thread-local structure.

    #if defined __nvptx__
    extern struct gomp_thread *nvptx_thrs __attribute__((shared));
    static inline struct gomp_thread *gomp_thread (void)
    {
      int tid;
      asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
      return nvptx_thrs + tid;
    }
    #elif defined HAVE_TLS || defined USE_EMUTLS
    extern __thread struct gomp_thread gomp_tls_data;
    static inline struct gomp_thread *gomp_thread (void)
    {
      return &gomp_tls_data;
    }
    #else
    extern pthread_key_t gomp_tls_key;
    static inline struct gomp_thread *gomp_thread (void)
    {
      return pthread_getspecific (gomp_tls_key);
    }
    #endif
    

    The data structure ts is a struct gomp_team_state that, amongst others, contains:

      [...]
      /* Nesting level.  */
      unsigned level;
    
      /* Active nesting level.  Only active parallel regions are counted.  */
      unsigned active_level;
      [...]
    

    And whenever #pragma omp parallel is used, the compiler extracts the body of the parallel section into a subfunction and generates a complicated set of function calls that eventually lead to gomp_team_start(), which contains:

    #ifdef LIBGOMP_USE_PTHREADS
    void
    gomp_team_start (void (*fn) (void *), void *data, unsigned nthreads,
                     unsigned flags, struct gomp_team *team)
    {
    
      [...]
    
      ++thr->ts.level;
      if (nthreads > 1)
        ++thr->ts.active_level;