I have this code:
#pragma acc kernels
#pragma acc loop seq
for(i=0; i<bands; i++)
{
mean=0;
#pragma acc loop seq
for(j=0; j<N; j++)
mean+=(image[(i*N)+j]);
mean/=N;
meanSpect[i]=mean;
#pragma acc loop
for(j=0; j<N; j++)
image[(i*N)+j]=image[(i*N)+j]-mean;
}
As you can see, the first loop is told to be executed in sequence / single thread mode, the first loop inside too, but the last one can be parallelized so I do that.
My question is, how do I translate this to SYCL? Do I put everything inside one q.submit() and then inside create a parallel_for() only for the parallel region? Would that be possible (and correct)?
Second question, the above code continues as follows:
#pragma acc parallel loop collapse(2)
for(j=0; j<bands; j++)
for(i=0; i<bands; i++)
Corr[(i*bands)+j] = Cov[(i*bands)+j]+(meanSpect[i] * meanSpect[j]);
How do I indicate the collapse() tag in SYCL? Does it exist or do I have to program it in other way?
Thank you very much in advance.
In case anyone sees this, here's the correct answer:
First code:
for(i=0; i<bands; i++)
{
mean=0;
for(j=0; j<N; j++)
mean+=(image[(i*N)+j]);
mean/=N;
meanSpect[i]=mean;
q.submit([&](auto &h) {
h.parallel_for(range(N), [=](auto j) {
image[(i*N)+j]=image[(i*N)+j]-mean;
});
}).wait();
}
Second code:
q.submit([&](auto &h) {
h.parallel_for(range<2>(bands_sycl,bands_sycl), [=](auto index) {
int i = index[1];
int j = index[0];
Corr[(i*bands)+j] = Cov[(i*bands)+j]+(meanSpect[i] * meanSpect[j]);
});
}).wait();