I have a custom struct
on which I want to do an operation to reduce the field scalar1
on all my structs. It is a very straightforward operation. It seems the subtraction is happening, but OpenCL does the operation on the wrong data. This is an MWE that can probably execute on your computer.
import pyopencl as cl
import pyopencl.tools
import numpy as np
kernelSource = """
__kernel void decreaseScalarFieldBy(__global myStruct *a, float delta)
{
int gid = get_global_id(0);
a[gid].scalar1 -= delta;
}
"""
context = cl.create_some_context()
queue = cl.CommandQueue(context)
device = context.devices[0]
myStruct = np.dtype(
[("vector1", cl.cltypes.float4),
("scalar1", cl.cltypes.float)])
name = "myStruct"
_, c_decl = cl.tools.match_dtype_to_c_struct(device, name, myStruct)
myStruct_dtype = cl.tools.get_or_register_dtype(name, myStruct)
program = cl.Program(context, c_decl + kernelSource).build()
N = 10
HOST_struct = np.empty(N, dtype=myStruct_dtype)
HOST_struct["vector1"] = np.array([cl.cltypes.make_float4(1, 0, 0, 0)]*N, dtype=cl.cltypes.float4)
HOST_struct["scalar1"] = np.ones(N, dtype=cl.cltypes.float)
TARGET_struct = cl.Buffer(context, cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=HOST_struct)
cl.enqueue_copy(queue, dest=TARGET_struct, src=HOST_struct)
program.decreaseScalarFieldBy(queue, (N,), None, TARGET_struct, np.float32(0.5))
cl.enqueue_copy(queue, dest=HOST_struct, src=TARGET_struct)
queue.finish()
print(HOST_struct)
And here's the output, which has the subtraction all shifted, which ends up subtracting inside the vector1
field?
[((1. , 0. , 0. , 0. ), 0.5) ((1. , 0. , 0. , 0. ), 1. )
((1. , 0. , -0.5, 0. ), 1. ) ((1. , 0. , 0. , 0. ), 1. )
((0.5, 0. , 0. , 0. ), 1. ) ((1. , 0. , 0. , -0.5), 1. )
((1. , 0. , 0. , 0. ), 1. ) ((1. , -0.5, 0. , 0. ), 1. )
((1. , 0. , 0. , 0. ), 0.5) ((1. , 0. , 0. , 0. ), 1. )]
Can someone explain to me what is wrong with this code?
I've tried separately with float
and float4
and they both work perfectly when outside a struct.
The concept of memory alignment was unknown to me.
From the OpenCL 1.2 specification, chapter 6.1, we learn that types have to be align on a 2^X. My struct is thus misaligned because it has a size of 36Bytes. sizeOf(float4) = 16 Bytes
, sizeOf(float) = 4 Bytes
.
However, numpy
aligns arrays by default, but NOT in the same way as OpenCL does. Thus, the struct
has to be matched to OpenCL alignment. This is the job of
_, c_decl = cl.tools.match_dtype_to_c_struct(device, name, myStruct)
The problem is that in my code, the new redeclared struct was omitted with the _
. What needs to be done is to take the new redeclared struct
and overwrite our old numpy
declaration.
myStruct = np.dtype(
[("position", cl.cltypes.float4),
("direction", cl.cltypes.float4),
("weight", cl.cltypes.float)])
name = "photonStruct"
myStruct, c_decl = cl.tools.match_dtype_to_c_struct(device, name, myStruct)
myStruct_dtype = cl.tools.get_or_register_dtype(name, myStruct)
Now everything works fine.