multithreadingopenmp

In OpenMP (C/C++), what is the initial private array of each thread when array reduction is to be applied?


OpenMP-4.5 supports reduction of raw arrays reduction(+:array[:]), analogous to previous scalar reduction reduction(+:scalar).

There is plenty of information on how a scalar to be reduced can be initialized privately for each thread (things like omp_priv and omp_orig), but I cannot find much on the initialization of private arrays. I wonder whether the private array of each thread is initialized as (a) a copy of the original array defined above the parallel block or (b) an all-zero array?


Solution

  • It depends on the reduction type. For addition, the initial array is all-zero. For multiplication, the initial array is all-one. The choice of the initial array probably follows Table 2.11 of https://www.openmp.org/spec-html/5.0/openmpsu107.html, just like scalar reduction. If this is not what you want, you can also declare your own reduction strategy with the keywords omp_out, omp_in, omp_priv and omp_orig. The directives such as omp_priv=omp_orig seem to be applied to each element of the array as an individual scalar.

    #include <iostream>
    #include <omp.h>
    
    #define NTHREADS 2
    #define SIZE 3
    #define NLOOPS 4
    
    int main(){
    
        int* array = new int[SIZE];
        for ( int i = 0; i < SIZE; i++ ) array[i] = i;
        // Initial array 0 1 2
    
        // Addition 
        #pragma omp parallel for reduction(+:array[:SIZE]) num_threads(NTHREADS)
        for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
            #pragma omp critical
            for ( int j = 0; j < SIZE; j++ )
                std::cout << array[j];
        std::cout << std::endl;
        // 000000000000
    
        // Multiplication
        #pragma omp parallel for reduction(*:array[:SIZE]) num_threads(NTHREADS)
        for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
            #pragma omp critical
            for ( int j = 0; j < SIZE; j++ )
                std::cout << array[j];
        std::cout << std::endl;
        // 111111111111
    
        // User-defined reduction
        #pragma omp declare reduction(MySum: int: omp_out += omp_in) initializer(omp_priv = 2 * omp_orig)
        #pragma omp parallel for reduction(MySum:array[:SIZE]) num_threads(NTHREADS)
        for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
            #pragma omp critical
            for ( int j = 0; j < SIZE; j++ )
                std::cout << array[j];
        std::cout << std::endl;
        // 024024024024
    
        return 0;
    }
    

    Tested on GCC-8.3.0 and ICC-19.1.1.217.