I want to populate an array with float4
type. I have no idea how to initialize the arrays with something else than zeros. I've tried variations of this, but this is what I've come with, that explains what I want to do:
import pyopencl as cl
import numpy as np
kernelSource = """
__kernel void addOneToFloat4(__global float4 *a)
{
int gid = get_global_id(0);
a[gid] += 1.0f;
}
"""
context = cl.create_some_context()
queue = cl.CommandQueue(context)
device = context.devices[0]
program = cl.Program(context, kernelSource).build()
N = 10
HOST_array = np.array([[1, 0, 0, 0]]*N, dtype=cl.cltypes.float4)
TARGET_array = cl.Buffer(context, cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=HOST_array)
cl.enqueue_copy(queue, dest=TARGET_array, src=HOST_array)
program.addOneToFloat4(queue, (N,), None, TARGET_array)
cl.enqueue_copy(queue, dest=HOST_array, src=TARGET_array)
queue.finish()
print(HOST_array)
of course it doesn't work, because it understands the input with spahe (N, 4), but since float4 is just a type, it requires (N, ) size.
I've seen people initialize with np.zeros(N, dtype=float4)
, but I don't want to initialize to 0.
I find very few practical examples for pyopencl, and the documentation doesn't always help, it doesn't even mention float3
or float4
.
If we look at the OpenCL documentation, we can see that the type float4 is a struct which has .x, .y, .z, .w
as its fields. It is also declared as a type, so I expect to be able to use it like any other type.
after searching the source code of pyopencl
, I figured the problem was due to the functions generated on runtime. Also, it is not explicit in the documentation that those functions were available. So to load an array into a type<n>
you need to call the function cl.cltypes.make_type<n>
and set the type to cl.cltypes.type<n>
. Because this is generated at runtime, it will not be in the namespace, so your ide will not recognize them.
myFloat4 = cl.cltypes.make_float4(0,1,1,0)
myArrayFloat4 = np.array([myFloat4], dtype=cl.cltypes.float4)
So, for completeness, here's my fix:
import pyopencl as cl
import numpy as np
kernelSource = """
__kernel void addOneToFloat4(__global float4 *a)
{
int gid = get_global_id(0);
a[gid] += 1.0f;
}
"""
context = cl.create_some_context()
queue = cl.CommandQueue(context)
device = context.devices[0]
program = cl.Program(context, kernelSource).build()
N = 10
HOST_array = np.array([cl.cltypes.make_float4(1, 0, 0, 0)]*N, dtype=cl.cltypes.float4)
TARGET_array = cl.Buffer(context, cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=HOST_array)
cl.enqueue_copy(queue, dest=TARGET_array, src=HOST_array)
program.addOneToFloat4(queue, (N,), None, TARGET_array)
cl.enqueue_copy(queue, dest=HOST_array, src=TARGET_array)
queue.finish()
print(HOST_array)