I just started learning CUDA and I'm having an issue converting some code to use shared memory and another to use constant memory, for comparison purposes.
__global__ void CUDA(int *device_array_Image1, int *device_array_Image2,int *device_array_Image3, int *device_array_kernel, int *device_array_Result1,int *device_array_Result2,int *device_array_Result3){
int i = blockIdx.x;
int j = threadIdx.x;
int ArraySum1 = 0 ; // set sum = 0 initially
int ArraySum2 = 0 ;
int ArraySum3 = 0 ;
for (int N = -1 ; N <= 1 ; N++)
{
for (int M = -1 ; M <= 1 ; M++)
{
ArraySum1 = ArraySum1 + (device_array_Image1[(i + N) * Image_Size + (j + M)]* device_array_kernel[(N + 1) * 3 + (M + 1)]);
ArraySum2 = ArraySum2 + (device_array_Image2[(i + N) * Image_Size + (j + M)]* device_array_kernel[(N + 1) * 3 + (M + 1)]);
ArraySum3 = ArraySum3 + (device_array_Image3[(i + N) * Image_Size + (j + M)]* device_array_kernel[(N + 1) * 3 + (M + 1)]);
}
}
device_array_Result1[i * Image_Size + j] = ArraySum1;
device_array_Result2[i * Image_Size + j] = ArraySum2;
device_array_Result3[i * Image_Size + j] = ArraySum3;
}
This is what I have done so far but I'm having an issue understanding the shared and constant memory so if anyone could help with the code or point me in the right direction I'd be really grateful.
Thanks for any help.
a) Shared memory: This memory will be visible only to all threads in a block. This shared memory is useful if you are accessing data more than once from that block.So in squaring of a number it will not be useful but while matrix multiplication it is useful.
b) Constant memory: Data is stored in device global memory and data can be read through multiprocessor constant cache. 64KB constant memory and 8KB cache is given to each multiprocessor.Data is broadcast to all threads in a warp.So if all the threads in the warp request the same value, that value is delivered to in a single cycle.
Below links helped me in understanding constant and shared memory
1) http://cuda-programming.blogspot.in/2013/01/what-is-constant-memory-in-cuda.html
2) http://cuda-programming.blogspot.in/2013/01/shared-memory-and-synchronization-in.html
3) https://devblogs.nvidia.com/parallelforall/using-shared-memory-cuda-cc/
Please refer this links.