jcuda

Cannot understand how jCuda cuLaunchKernel work?


I am trying to understand how to use Cuda in Java. I am using jCuda.

Everything was fine until I came across an example containing the code:

    // Set up the kernel parameters: A pointer to an array
    // of pointers which point to the actual values.
    Pointer kernelParameters = Pointer.to(
        Pointer.to(new int[]{numElements}),
        Pointer.to(deviceInputA),
        Pointer.to(deviceInputB),
        Pointer.to(deviceOutput)
    );

The kernel function prototype is:

__global__ void add(int n, float *a, float *b, float *sum)

The question is: In terms of c, does it not seem that we are passing something like?

(***n, ***a, ***b, ***sum)

So basically, do we always have to have:

Pointer kernelParameters = Pointer.to( double pointer, double pointer, ...)???

Thank you


Solution

  • The cuLaunchKernel function of JCuda corresponds to the cuLaunchKernel function of CUDA. The signature of this function in CUDA is

    CUresult cuLaunchKernel(
        CUfunction f, 
        unsigned int gridDimX, 
        unsigned int gridDimY, 
        unsigned int gridDimZ, 
        unsigned int blockDimX, 
        unsigned int blockDimY, 
        unsigned int blockDimZ, 
        unsigned int sharedMemBytes, 
        CUstream hStream, 
        void** kernelParams, 
        void** extra) 
    

    where the kernelParams is the only parameter that is relevant for this question. The documentation says

    Kernel parameters can be specified via kernelParams. If f has N parameters, then kernelParams needs to be an array of N pointers. Each of kernelParams[0] through kernelParams[N-1] must point to a region of memory from which the actual kernel parameter will be copied.


    The key point here is the last sentence: The elements of the kernelParams array are not the actual kernel parameters. They only point to the actual kernel parameters.

    And indeed, this has the odd effect that for a kernel that receives a single float *pointer, you could basically set up the kernel parameters as follows:

    float *pointer= allocateSomeDeviceMemory();
    float** pointerToPointer = &pointer;
    float*** pointerToPointerToPointer = &pointerToPointer;
    void **kernelParams = pointerToPointerToPointer;
    

    (This is just to make clear that this is indeed a pointer to a pointer to a pointer - in reality, wou wouldn't write it like that)


    Now, the "structure" of the kernel parameters is basically the same for JCuda and for CUDA. Of course you can not take "the address of a pointer" in Java, but the number of indirections is the same. Imagine you have a kernel like this:

    __global__ void example(int value, float *pointer)
    

    In the CUDA C API, you can then define the kernel parameters as follows:

    int value = 123;
    float *pointer= allocateSomeDeviceMemory();
    
    int* pointerToValue = &value;
    float** pointerToPointer = &pointer;
    
    void **kernelParams = {
        pointerToValue,
        pointerToPointer
    };
    

    The setup is done analogously in the JCuda Java API:

    int value = 123;
    Pointer pointer= allocateSomeDeviceMemory();
    
    Pointer pointerToValue = Pointer.to(new int[]{value});
    float** pointerToPointer = Pointer.to(pointer);
    
    Pointer kernelParameters = Pointer.to(
        pointerToValue,
        pointerToPointer
    );
    

    The main difference that is relevant here is that you can write this a bit more concisely in C, using the address operator &:

    void **kernelParams = {
        &value,             // This can be imagined as a pointer to an int
        &pointer            // This can be imagined as a pointer to a pointer
    };
    

    But this is basically the same as in the example that you provided:

    Pointer kernelParameters = Pointer.to(
        Pointer.to(new int[]{value}),   // A pointer to an int
        Pointer.to(pointer)             // A pointer to a pointer
    );
    

    Again, the key point is that with something like

    void **kernelParams = {
        &value,
    };
    

    or

    Pointer kernelParameters = Pointer.to(
        Pointer.to(new int[]{value}),
    );
    

    you are not passing the value to the kernel directly. Instead, you are telling CUDA: "Here is an array of pointers. The first pointer points to an int value. Copy the value from this memory location, and use it as the actual value for the kernel call".