I want to parallelize this code getting the best performance. "histogram" stores number of appareances of a certain colour (there are 10 different colours, so the size of histogram is 10). "img" is an array which stores a certain image information. In each index of img is stored a colour (int value, range 0..9). This is the code:
for( i=0; i<N1; i++ ){
for( j=0; j<N2; j++ ){
histogram[ img[i][j] ] = histogram[ img[i][j] ] + 1;
}
}
I tried this but the performance is so bad (worse than serial execution):
#pragma omp parallel for schedule(static, N1/nthreads) private(i,j)
for(i=0; i<N1; i++){
for(j=0; j<N2; j++)
{
#pragma omp atomic
histogram[img[i][j]]++;
}
}
Any suggestions? Thank you.
I already went into detail on how to to this here Fill histograms (array reduction) in parallel with OpenMP without using a critical section
It's the same as an array reduction. OpenMP does not have built in support for this in C/C++ (but it does in Fortran) so you have to do it yourself.
The easy solution is to create private version of the histogram, fill them in parallel, and them merge them into one histogram in a critical section. In your case you can do that like this:
int i, histogram[10];
for(i=0; i<10; i++) histogram[i] = 0;
#pragma omp parallel
{
int i, j, histogram_private[10];
for(i=0; i<10; i++) histogram_private[i] = 0;
#pragma omp for nowait
for(i=0; i<N1; i++) {
for(j=0; j<N2; j++) {
histogram_private[img[i][j]]++;
}
}
#pragma omp critical
{
for(i=0; i<10; i++) histogram[i] += histogram_private[i];
}
}
It's possible to merge in parallel as well but that's more complicated. See the first link I mentioned for more details.