Consider the following CUDA kernel that is used in Python via CuPy from the CuPy docs
add_kernel = cp.RawKernel(r'''
extern "C" __global__
void my_add(const float* x1, const float* x2, float* y) {
extern __shared__ int sharedValues[];
int tid = blockDim.x * blockIdx.x + threadIdx.x;
y[tid] = x1[tid] + x2[tid];
}
''', 'my_add')
x1 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
x2 = cp.arange(25, dtype=cp.float32).reshape(5, 5)
n = 1
shared_memory_size = n*64; #*sizeof(int)
# Output Results
y = cp.zeros((5, 5), dtype=cp.float32)
add_kernel((512,), (1024,), (x1, x2, y)) # grid, block and arguments
I would like to assign the shared memory in a (cupy raw) CUDA kernel
but I don't know how to give the parameter shared_memory_size
to the add_kernel
.
Example taken from Using Shared Memory in CUDA C/C++ (Nvidia blog post)
add_kernel<<<512,1024,n*sizeof(int)>>>(x1,x2,y);
When I try to call it with an additional parameter, I get an error
add_kernel((512,), (1024,), (shared_memory_size,), (x1, x2, y)) # grid, block and arguments
When I try to set the attribute
assign_importance_into_dense_array_kernel_int32.shared_size_bytes = shared_memory_size;
I get the error
AttributeError: attribute 'shared_size_bytes' of 'cupy._core.raw.RawKernel' objects is not writable
There is also the parameter add_kernel.max_dynamic_shared_size_bytes
but does that change the dynamic size?
As @Robert said, you specify the dynamic shared memory in the cp.RawKernel.__call__
method. As per the documentation
https://docs.cupy.dev/en/stable/reference/generated/cupy.RawKernel.html
__call__(self, grid, block, args, *, shared_mem=0)
The last named parameter can be set to n*sizeof(int)
, and you'll be good to go.
...
shared_memory_size = n * sizeof(int)
add_kernel((512,), (1024,), (x1, x2, y), shared_mem = shared_memory_size)