OpenMP-4.5 supports reduction of raw arrays reduction(+:array[:])
, analogous to previous scalar reduction reduction(+:scalar)
.
There is plenty of information on how a scalar to be reduced can be initialized privately for each thread (things like omp_priv
and omp_orig
), but I cannot find much on the initialization of private arrays.
I wonder whether the private array of each thread is initialized as
(a) a copy of the original array defined above the parallel block or
(b) an all-zero array?
It depends on the reduction type.
For addition, the initial array is all-zero.
For multiplication, the initial array is all-one.
The choice of the initial array probably follows Table 2.11 of https://www.openmp.org/spec-html/5.0/openmpsu107.html, just like scalar reduction.
If this is not what you want, you can also declare your own reduction strategy with the keywords omp_out
, omp_in
, omp_priv
and omp_orig
.
The directives such as omp_priv=omp_orig
seem to be applied to each element of the array as an individual scalar.
#include <iostream>
#include <omp.h>
#define NTHREADS 2
#define SIZE 3
#define NLOOPS 4
int main(){
int* array = new int[SIZE];
for ( int i = 0; i < SIZE; i++ ) array[i] = i;
// Initial array 0 1 2
// Addition
#pragma omp parallel for reduction(+:array[:SIZE]) num_threads(NTHREADS)
for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
#pragma omp critical
for ( int j = 0; j < SIZE; j++ )
std::cout << array[j];
std::cout << std::endl;
// 000000000000
// Multiplication
#pragma omp parallel for reduction(*:array[:SIZE]) num_threads(NTHREADS)
for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
#pragma omp critical
for ( int j = 0; j < SIZE; j++ )
std::cout << array[j];
std::cout << std::endl;
// 111111111111
// User-defined reduction
#pragma omp declare reduction(MySum: int: omp_out += omp_in) initializer(omp_priv = 2 * omp_orig)
#pragma omp parallel for reduction(MySum:array[:SIZE]) num_threads(NTHREADS)
for ( int iiter = 0; iiter < NLOOPS ; iiter++ )
#pragma omp critical
for ( int j = 0; j < SIZE; j++ )
std::cout << array[j];
std::cout << std::endl;
// 024024024024
return 0;
}
Tested on GCC-8.3.0 and ICC-19.1.1.217.