Basically, I am having a difficult time understand exactly what is going wrong here.
Shared memory does not appear to be behaving in a block exclusive manner while debugging. When running the code normally, nothing is printed. However, if I attempt to debug it, shared memory is shared between blocks and the print statement is reached.
This is an example, obviously this isn't terribly useful code, but it reproduces the issue on my system. Am I doing something wrong? Is this a bug or expected behavior from the debugger?
__global__
void test()
{
__shared__ int result[1];
if (blockIdx.x == 0 && blockIdx.y == 0 && blockIdx.z == 0)
result[0] = 4444;
else
{
if (result[0] == 4444)
printf("This should never print if shared memory is unique\n");
}
}
And to launch it:
test<<<dim3(8,8,1), dim3(8,8,1)>>>();
It is also entirely possible that I have completely misunderstood shared memory.
Thanks for the help.
Other Information:
I am using a GTX 460. compute_20
and sm_20
are set for the project. I am writing the code in Visual Studio 2010 using Nsight 3.0 preview.
There is a subtle but important difference between
shared memory is shared between blocks and the print statement is reached
and
shared memory is re-used by successive blocks and the print statement is reached
You are assuming the former, but the latter is what is really happening.
Your code, with the exception of the first block, is reading from uninitialised memory. That, in itself, is undefined behaviour. C++ (and CUDA) don't guarantee that statically declared memory is set to any value when it either comes into, or goes out of scope. You can't expect that result
wouldn't have a value of 4444, especially when it is probably stored in the same shared scratch space as a previous block which may have set it to a value of 4444.
The entire premise of the code and this question are flawed and you should draw no conclusions from the result you see other that undefined behaviour is undefined.