I have a simple problem where I want to copy randomly generated floats on an OpenCL device and use those values on the device because OpenCL
doesn't provide a random number generator.
However, it seems that the values cannot get properly used when on the device. The values returned to HOST_result
are not the initially generated value, they are just probably older values that were onto the memory space.
Here is a minimally-non-working example. I would require someone to point out what is wrong with this code.
import pyopencl as cl
import numpy as np
kernelSource = """
__kernel void testKernel(__global float *result, __global float *randomNum)
{
int gid = get_global_id(0);
result[gid] = randomNum[gid];
}
"""
context = cl.create_some_context()
queue = cl.CommandQueue(context)
device = context.devices[0]
program = cl.Program(context, kernelSource).build()
N = 4
HOST_result = np.ones(N, dtype=cl.cltypes.float)
print(HOST_result.shape)
DEVICE_result = cl.Buffer(context, cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=HOST_result)
HOST_rand = np.random.uniform(low=0.0, high=1.0, size=N)
print(HOST_rand.shape)
HOST_rand.astype(cl.cltypes.float)
DEVICE_rand = cl.Buffer(context, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf=HOST_rand)
program.testKernel(queue, HOST_result.shape, None, DEVICE_result, DEVICE_rand)
queue.finish()
cl.enqueue_copy(queue, dest=HOST_result, src=DEVICE_result)
print(HOST_rand)
print(HOST_result)
"""
Using cl.Buffer() with the COPY_HOST_PTR flag, the host buffer is copied to the device buffer and it allocates memory
the size of the buffer where HOST_PTR points to.
"""
The outputs are
(4,)
(4,)
[0.02692256 0.82201746 0.05025519 0.31266968]
[-6.9784595e-12 1.2153804e+00 -2.4271645e-30 1.8305043e+00]
This error does not happen if we change the HOST_rand
to HOST_rand = np.zeros(N, dtype=cl.cltypes.float)
. All the values at the end become [0, 0, 0, 0]
as expected. Any np.array([])
seems to work as well.
Turns out, HOST_rand.astype(cl.cltypes.float)
changes the type, but not the memory alignment.
Thus, because my functions use float32
, the values are read wrong on the OpenCL
device.
To remedy the situation, it is possible to use numpy
generator objects. This object allows to pass more parameters for the random values generation.
We get the default Generator
, then we pass the correct type.
rndGenerator = np.random.default_rng()
HOST_rand = rndGenerator.random(size=N, dtype=cl.cltypes.float)
Now the values are correctly read on the device.