I declared shared memory and tried to trace it with Nsight 2.2 for Visual Studio 2010. I'm using CUDA 4.2 with a Quadro 5000.
in my kernel.cu:
extern __shared__ ushort2 sampleGatheringSM[];
The code launching the kernel:
sampleGathering_SM_size =dimBlock.x*dimBlock.y*4*sizeof(ushort2)*2; // = 10240
sampleGatheringKernel<<<dimGrid, dimBlock, sampleGathering_SM_size >>>(dev_image, dev_gradient, width, height);
When I look at the analysis activity on Nsight then "CUDA Launches", it tells me that:
Did I allocate shared memory correctly? I don't understand how I could allocate registers.
EDIT:
It also tells me:
The declaration of dynamic shared memory is correct. Nsight 2.2 Analysis Trace Report has a bug that only occurs for CUDA Trace Activities. Analysis Trace Activities run with the option Nsight | Options| Analysis | CUDA Kernel Trace Mode = Serialized and Analysis Profiler CUDA Activities display the correct value. This bug will be fixed in the next version of Nsight.