openclgpgpupyopenclopencl.net

AMD OpenCL C Compiler notes dead and deleted loops which shouldn't be dead and deleted


I have the following loop executed in my OpenCl kernel:

__kernel void kernelA(/* many parameters */)
{
    /*  Prefetching code and other stuff
     *  ...
     *  ...
     */

    float2 valueA = 0.0f;

    #pragma unroll                              //<----- line X
    for(unsigned int i = 0; i < MAX_A; i++)     // MAX_A > 0
    {
        #pragma unroll
        for(unsigned int j = 0; j < MAX_B; j++) // MAX_B > 0
            valueA += arrayA[(i * MAX_A) + j];
    }

    /*
     *  Code that uses the result saved to valueA
     */
}

As can be seen clearly the loop shall summarize values contained in arrayA. Now I wanted to try the #pragma unroll to see whether there is any performance difference between looped and unrolled execution.

But when I compile the kernel, the compiler notes LOOP UNROLL: pragma unroll (line X) ignored because this loop is dead and deleted. I don't understand that information, because the code in the loop is surely executed. MAX_A and MAX_B are definitely greater than zero and the the sum saved to valueA is also used after the loop.

I have the same structure somewhere else in the code and also this position is marked by the upper note.

The compiler I use is the AMD OpenCL C compiler delivered by the APP SDK.


Solution

  • The comment by @DarkZeroes is the solution of this question. There was no instruction to put the result into an output array of the kernel, so the code above and everything what depended on that was optimized away by the compiler.