c++multithreadingparallel-processingopenmpreduction

Reducing on array in OpenMP


I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an alternative? Thanks. (I added reduction on m which is wrong but would like to have an advice on how to do it.)

#include <iostream>
#include <stdio.h>
#include <time.h>
#include <omp.h>
using namespace std;

int main ()
{
  int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
  int S [10];

  time_t start_time = time(NULL);
  #pragma omp parallel for private(m) reduction(+:m)
  for (int n=0 ; n<10 ; ++n ){
    for (int m=0; m<=n; ++m){
      S[n] += A[m];
    }
  }
  time_t end_time = time(NULL);
  cout << end_time-start_time;

  return 0;
}

Solution

  • Yes it is possible to do an array reduction with OpenMP. In Fortran it even has construct for this. In C/C++ you have to do it yourself. Here are two ways to do it.

    The first method makes private version of S for each thread, fill them in parallel, and then merges them into S in a critical section (see the code below). The second method makes an array with dimentions 10*nthreads. Fills this array in parallel and then merges it into S without using a critical section. The second method is much more complicated and can have cache issues especially on multi-socket systems if you are not careful. For more details see this Fill histograms (array reduction) in parallel with OpenMP without using a critical section

    First method

    int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
    int S [10] = {0};
    #pragma omp parallel
    {
        int S_private[10] = {0};
        #pragma omp for
        for (int n=0 ; n<10 ; ++n ) {
            for (int m=0; m<=n; ++m){
                S_private[n] += A[m];
            }
        }
        #pragma omp critical
        {
            for(int n=0; n<10; ++n) {
                S[n] += S_private[n];
            }
        }
    }
    

    Second method

    int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
    int S [10] = {0};
    int *S_private;
    #pragma omp parallel
    {
        const int nthreads = omp_get_num_threads();
        const int ithread = omp_get_thread_num();
    
        #pragma omp single 
        {
            S_private = new int[10*nthreads];
            for(int i=0; i<(10*nthreads); i++) S_private[i] = 0;
        }
        #pragma omp for
        for (int n=0 ; n<10 ; ++n )
        {
            for (int m=0; m<=n; ++m){
                S_private[ithread*10+n] += A[m];
            }
        }
        #pragma omp for
        for(int i=0; i<10; i++) {
            for(int t=0; t<nthreads; t++) {
                S[i] += S_private[10*t + i];
            }
        }
    }
    delete[] S_private;