pythonpyopencl

pyopencl global_work_offset kernel argument


I want to use the global_work_offset argument from OpenCL API function clEnqueueNDRangeKernel. I can't figure out how to do that within pyopencl API. Here is a demo code where I want to add an offset of 2 to kernel invocation so get_global_id(0) starts at 2 instead of 0:

import pyopencl as cl 
import pyopencl.array 
import numpy as np

platform = cl.get_platforms()[0]
devices = platform.get_devices()[1] #gpu
context = cl.Context(devices=[devices])
queue =  cl.CommandQueue(context)

kernel = cl.Program(context, """
    __kernel void derp(global char* a) {
        a[get_global_id(0)] = 1;
    }""").build()

buffarr = cl.array.zeros(queue, 4, dtype=np.uint8)
kernel.derp(queue, (2,), None, buffarr.data)

np_data = buffarr.get()

# within this demo the buffer contains currently [1,1,0,0]
assert np.array_equal(np_data, [0,0,1,1])

How to change the code so that assertion won't fail? I Don't want to add an extra argument here to the kernel code.


Solution

  • As per the documentation, you may pass the global_offset as a named parameter.

    The kernel's call becomes:

    kernel.derp(queue, (4, 1), None, buffarr.data, global_offset=[2, 0])
    

    The program with the change:

    import pyopencl as cl
    import pyopencl.array
    import numpy as np
    
    
    platform = cl.get_platforms()[2]
    print(platform)
    devices = platform.get_devices()[0] #gpu
    context = cl.Context(devices=[devices])
    queue =  cl.CommandQueue(context)
    
    kernel = cl.Program(context, """
        __kernel void derp(global char* a) {
            a[get_global_id(0)] = 1;
        }""").build()
    
    
    buffarr = cl.array.zeros(queue, 4, dtype=np.uint8)
    
    # (4, 1) ==> shape of the buffer
    kernel.derp(queue, (4, 1), None, buffarr.data, global_offset=[2, 0])
    
    np_data = buffarr.get()
    print(np_data)
    # within this demo the buffer contains currently [1,1,0,0]
    assert np.array_equal(np_data, [0,0,1,1])
    print("Ok")
    

    After the execution:

    On device 0

    <pyopencl.Platform 'Intel(R) OpenCL' at 0x60bdc0>
    [0 0 1 1]
    Ok
    

    On device 1

    <pyopencl.Platform 'Experimental OpenCL 2.0 CPU Only Platform' at 0xb60a20>
    [0 0 1 1]
    Ok
    

    On device 2

    <pyopencl.Platform 'NVIDIA CUDA' at 0xff0440>
    [0 0 1 1]
    Ok
    

    tested with python 2.7.11 [MSC v.1500 64 bit (AMD64)] - pyopencl (2015, 1)