I'm trying to make a parallel cilk code of this code using cilk_for:
c[0:2*n-1] = 0;
for (size_t i=0; i<n; ++i)
c[i:n] += a[i]*b[0:n];
in serial code:
for( size_t j=0; j<2*n-1; ++j )
c[j] = 0;
for (size_t i=0; i<n; ++i)
for( size_t j=0; j<n; ++j )
c[i+j] += a[i]*b[j];
For example:
x^2+x+1
2x^2+3x+5
C[0]=A[0]·B[0]
C[1]=A[0]·B[1]+A[1]·B[0]
.....
Easiest way would be to write a cilk_for loop that loops over the output coefficients, and inside the loop, for each output coefficient, accumulate an inner product.
Call the output coefficient c[k]. The loop will look like:
cilk_for( k=0; k<2n-1; ++k )
c[k] = __sec_reduce( a[...:...]*b[...:...:-1] );
The ... need to be expressions that yield the subsections that contribute to each output coefficient. I have an intermittent Internet connection, so I'm leaving that as an exercise to the reader.
The downdload site for the book (http://parallelbook.com/downloads) has a recursive version that is asymptotically faster than the scheme above.