cudansightgpu-shared-memory

Shared memory allocation in CUDA


I am trying to allocate on Quadro 4000 Nvidia device cap. 2.0 a block of static shared memory of the following size:

__shared__ char temp [128][128];

However when looking in Nsight debugger I can only see the 64*64 cells. Where are the rest of the cells. Also when I look in the profiler, I can see that under the shared memory column appear 16KB (Which is OK.)

What gives?


Solution

  • The Nsight Visual Studio Edition CUDA Debugger has several options to control the evaluation and visualization of expressions that appear in the variable watch windows. The default setting for array expansion is 64. The limit is in place to avoid limit the cost of evaluating large arrays.

    To change the settings

    1. From the top level Nsight menu execute the command Options...
    2. In the NVIDIA Nsight Options dialog

      • On the left pane select debugger
      • On the right pane change the setting Max array expansion elements to 128

    An alternate solution for your use case is to open one of the four Memory Windows and configure columns = 64 and type = 1-byte integer. If the value is text as opposed to numerical you can disable Data and set text to ANSI text.