c++cudaconstantscompiler-optimizationgpu-constant-memory

CUDA kernel do I need to put const in all pass-by-value parameters?


__global__
void diff2D(int top, int left, (const int windowSize or int windowSize), float* a1, float* a2)
{
    (const int mid or int mid) = windowsize / 2;
    ... // templateSize, windowSize are all const. 
} 

My question is that do I need to explicitly use "const int windowSize" keyword to make the compiler put mid into constant memory/cache? I don't want to put "const" in front of pass-by-value primitive types, because of the coding standards.

Constant parameters are put into constant cache. But I don't know whether a constant evaluated inside the kernel would be put there as well or not.


Solution

  • All kernel parameters of any type whatsoever are passed via constant memory. See here. The type or decoration of the parameters is irrelevant for this feature.

    No other constants of any type are placed in constant memory automatically, unless you use an appropriate __constant__ decoration/usage.

    Outside of kernel parameters, specifying:

    const int mid = windowsize/2;
    

    will not cause anything to be placed in constant memory.

    One way to think about constant memory is that it is at global scope. This makes sense both for kernel parameter usage (all threads can access those) as well as __constant__ declarations (all threads can access those).

    When you declare an ordinary local variable:

    const int mid = windowsize/2;
    

    that is not at global scope, it has logical thread local scope.

    A logical extension of this is you cannot do something like:

    __constant__ int mid = windowsize/2;
    

    in the midst of your kernel code, to suddenly declare a variable in __constant__ memory. Those declarations must be made at global/file scope.