I would increase speedup of these code with OpenMP.
for (i = 1; i < length; i++)
{
for (j = 1; j < i; j++)
sum_c += c[j];
c[i] = sum_c + W[i];
sum_c = 0;
}
I try with these solution
for (i = 1; i < length; i++)
{
#pragma omp parallel for num_threads(NTHREADS) reduction(+:sum_c)
for (j = 1; j < i; j++)
sum_c += c[j];
c[i] = sum_c + W[i];
sum_c = 0;
}
but I increase speedup only for
length=100000
Are there any better solutions? The code can be rewritten.
You are repeating work a lot using a inner loop. The same result can be obtained by the following single loop.
sum_c = 0;
for (i = 1; i < length; i++)
{
c[i] = sum_c + W[i];
sum_c += c[i];
}