opencl skipping data during operation on struct

I have a custom struct on which I want to do an operation to reduce the field scalar1 on all my structs. It is a very straightforward operation. It seems the subtraction is happening, but OpenCL does the operation on the wrong data. This is an MWE that can probably execute on your computer.

import pyopencl as cl
import pyopencl.tools
import numpy as np


kernelSource = """
__kernel void decreaseScalarFieldBy(__global myStruct *a, float delta)
{
    int gid = get_global_id(0);
        a[gid].scalar1 -= delta;
}
"""

context = cl.create_some_context()
queue = cl.CommandQueue(context)
device = context.devices[0]

myStruct = np.dtype(
    [("vector1", cl.cltypes.float4),
     ("scalar1", cl.cltypes.float)])
name = "myStruct"
_, c_decl = cl.tools.match_dtype_to_c_struct(device, name, myStruct)
myStruct_dtype = cl.tools.get_or_register_dtype(name, myStruct)

program = cl.Program(context, c_decl + kernelSource).build()

N = 10
HOST_struct = np.empty(N, dtype=myStruct_dtype)
HOST_struct["vector1"] = np.array([cl.cltypes.make_float4(1, 0, 0, 0)]*N, dtype=cl.cltypes.float4)
HOST_struct["scalar1"] = np.ones(N, dtype=cl.cltypes.float)
TARGET_struct = cl.Buffer(context, cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=HOST_struct)
cl.enqueue_copy(queue, dest=TARGET_struct, src=HOST_struct)

program.decreaseScalarFieldBy(queue, (N,), None, TARGET_struct, np.float32(0.5))

cl.enqueue_copy(queue, dest=HOST_struct, src=TARGET_struct)
queue.finish()

print(HOST_struct)

And here's the output, which has the subtraction all shifted, which ends up subtracting inside the vector1 field?

[((1. ,  0. ,  0. ,  0. ), 0.5) ((1. ,  0. ,  0. ,  0. ), 1. )
 ((1. ,  0. , -0.5,  0. ), 1. ) ((1. ,  0. ,  0. ,  0. ), 1. )
 ((0.5,  0. ,  0. ,  0. ), 1. ) ((1. ,  0. ,  0. , -0.5), 1. )
 ((1. ,  0. ,  0. ,  0. ), 1. ) ((1. , -0.5,  0. ,  0. ), 1. )
 ((1. ,  0. ,  0. ,  0. ), 0.5) ((1. ,  0. ,  0. ,  0. ), 1. )]

Can someone explain to me what is wrong with this code? I've tried separately with float and float4 and they both work perfectly when outside a struct.

Solution

The concept of memory alignment was unknown to me. From the OpenCL 1.2 specification, chapter 6.1, we learn that types have to be align on a 2^X. My struct is thus misaligned because it has a size of 36Bytes. sizeOf(float4) = 16 Bytes, sizeOf(float) = 4 Bytes.

However, numpy aligns arrays by default, but NOT in the same way as OpenCL does. Thus, the struct has to be matched to OpenCL alignment. This is the job of

_, c_decl = cl.tools.match_dtype_to_c_struct(device, name, myStruct)

The problem is that in my code, the new redeclared struct was omitted with the _. What needs to be done is to take the new redeclared struct and overwrite our old numpy declaration.

myStruct = np.dtype(
    [("position", cl.cltypes.float4),
     ("direction", cl.cltypes.float4),
     ("weight", cl.cltypes.float)])
name = "photonStruct"
myStruct, c_decl = cl.tools.match_dtype_to_c_struct(device, name, myStruct)
myStruct_dtype = cl.tools.get_or_register_dtype(name, myStruct)

Now everything works fine.