In OpenMP (C/C++), what is the initial private array of each thread when array reduction is to be applied?

OpenMP-4.5 supports reduction of raw arrays reduction(+:array[:]), analogous to previous scalar reduction reduction(+:scalar).

There is plenty of information on how a scalar to be reduced can be initialized privately for each thread (things like omp_priv and omp_orig), but I cannot find much on the initialization of private arrays. I wonder whether the private array of each thread is initialized as (a) a copy of the original array defined above the parallel block or (b) an all-zero array?

Solution

It depends on the reduction type. For addition, the initial array is all-zero. For multiplication, the initial array is all-one. The choice of the initial array probably follows Table 2.11 of https://www.openmp.org/spec-html/5.0/openmpsu107.html, just like scalar reduction. If this is not what you want, you can also declare your own reduction strategy with the keywords omp_out, omp_in, omp_priv and omp_orig. The directives such as omp_priv=omp_orig seem to be applied to each element of the array as an individual scalar.

#include <iostream>
#include <omp.h>

#define NTHREADS 2
#define SIZE 3
#define NLOOPS 4

int main(){

    int* array = new int[SIZE];
    for ( int i = 0; i < SIZE; i++ ) array[i] = i;
    // Initial array 0 1 2

    // Addition 
    #pragma omp parallel for reduction(+:array[:SIZE]) num_threads(NTHREADS)
    for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
        #pragma omp critical
        for ( int j = 0; j < SIZE; j++ )
            std::cout << array[j];
    std::cout << std::endl;
    // 000000000000

    // Multiplication
    #pragma omp parallel for reduction(*:array[:SIZE]) num_threads(NTHREADS)
    for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
        #pragma omp critical
        for ( int j = 0; j < SIZE; j++ )
            std::cout << array[j];
    std::cout << std::endl;
    // 111111111111

    // User-defined reduction
    #pragma omp declare reduction(MySum: int: omp_out += omp_in) initializer(omp_priv = 2 * omp_orig)
    #pragma omp parallel for reduction(MySum:array[:SIZE]) num_threads(NTHREADS)
    for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
        #pragma omp critical
        for ( int j = 0; j < SIZE; j++ )
            std::cout << array[j];
    std::cout << std::endl;
    // 024024024024

    return 0;
}

Tested on GCC-8.3.0 and ICC-19.1.1.217.