c++pointerscudagpu-shared-memory

Local pointer to shared memory in CUDA


How can I make each thread in the thread block has his own pointer to shared memory? I found some example of declaration of such pointers:

int __shared__ *p;
__shared__ int array[256];

p = &array[threadId];

Is this right or is there another way?


Solution

  • No that is not the correct way. In that example code, p is shared so it means that every thread in the block would be trying to access the same piece of memory. You could do it like this if threadId was the unique thread index with the block:

    int *p;
    __shared__ int array[256];
    
    p = &array[threadId];
    

    In this case the compiler would use either a register or thread local memory to store the unique address of an element in static shared memory allocation array for each thread in the block.