In OpenCL C kernel code, Default built-in functions are good, but what about user-defined functions? do they have any performance and memory decrease when compared with in-built ones? If so, does writing the said user-defined function inside __kernel void once or multiple times better?
For Example:-
gentype clamp ( gentype x,
gentype minval,
gentype maxval)
The Above is an In-built function that has no impact on Performance nor does it reduce gpu l0/l1 cache memory
By user-defined function I mean like this below
int Add(int a, int b)
{
return a + b;
}
do these functions have any impact on l0/l1 memory if so then is it better to Not write these as functions and instead use the code everywhere?
I usually inline all functions, except if they are very lengthy and are called many times within a kernel. For example
float __attribute__((always_inline)) sq(const float x) {
return x*x;
}
for computing the square of x
. Inlined functions come at no additional computational cost for the function calling itself. However if you inline a very long function many times in a kernel, the assembly blows up and spills into global memory, resultuing in a loss of performance. In this case, the overhead due to function call is negligible compared to the execution time of the function itself.
Finally, if you don't explicitely inline a very short function, the compiler will do it automatically in most cases. Same as for functions is true for loop unrolling with #pragma unroll
.
Regarding the math functions, most of them directly relate to the hardware, with a few exceptions. For example, the count leading zeroes function int y = clz(x);
, despite being translated into the clz
PTX instruction, has no dedicated hardware and is slower than emulating it with int y = 31-(int)(as_uint((float)x)>>23);
. Similarly, although the inverse square root rsqrt(x)
is executed in hardware,
float __attribute__((always_inline)) fast_rsqrt(const float x) {
return as_float(0x5F37642F-(as_int(x)>>1));
}
runs slightly faster but is less accurate. In most cases the built-in math functions are the best option though.