openclopencl-csycl

OpenCL creating kernel from Host function at runtime


I'm trying out some OpenCL and wondered if there is a way to pass functions as a parameter to a kernel or what is the closest available match for it (using OpenCL 1.2).

As an example consider a simple Monte Carlo integration like this:

/* this is 1/(2^32) */
#define MULTI (2.3283064365386962890625e-10)

/* for more information see: https://arxiv.org/pdf/2004.06278v2.pdf*/
uint
squares(ulong ctr, ulong key)
{
  ulong x, y, z;
  y = x = ctr * key;
  z = y + key;
  x = x * x + y;
  x = (x >> 32) | (x << 32);                /* round 1 */
  x = x * x + z; x = (x >> 32) | (x << 32); /* round 2 */
  return (x * x + y) >> 32;                 /* round 3 */
}

void
kernel
reduce(ulong  key,
       float  low,
       float  high,
       global float* partialSums,
       local  float* localSums)
{
  uint lid = get_local_id(0);

  float rand = squares(get_global_id(0), key) * MULTI;
  localSums[lid] = f((rand * (high - low)) + low);

  for (uint stride =  get_local_size(0) / 2; stride > 0; stride /= 2) {
    barrier(CLK_LOCAL_MEM_FENCE);

    if (lid < stride)
      localSums[lid] += localSums[lid + stride];
  }

  if (lid == 0)
    partialSums[get_group_id(0)] = localSums[0];
}

I found Passing a function as an argument in OpenCL which tells me that passing function pointers won't work. So i guess what would work is generating the kernel source with f defined at runtime and then compiling it (has this been done before? if so, where do i find it?). Maybe this kind of problem is easier to solve not using OpenCL but using SYCL (which i virtually know nothing about)?

I'm relatively new to this, so if this kind of problem is solved in a completely different manner, please let me know.


Solution

  • generating the kernel source with f defined at runtime and then compiling it

    Yeah it can be done. You could just create the whole source from scratch & then classic clCreateProgram + clBuildProgram.

    Another option is to split your program into static & dynamically generated parts, and then compile them separately at runtime via clCompileProgram (static part just once), then link them both with clLinkProgram. This could be somewhat faster.

    Maybe this kind of problem is easier to solve not using OpenCL but using SYCL

    it might be actually harder to solve with SYCL; i'm not sure if SYCL supports dynamic (runtime) compilation at all.