pointerscuda

CUDA device pointer manipulation


I've used:

float *devptr;
//...
cudaMalloc(&devptr, sizeofarray);
cudaMemcpy(devptr, hostptr, sizeofarray, cudaMemcpyHostToDevice);

in CUDA C to allocate and populate an array. Now I'm trying to run a cuda kernel, e.g.:

__global__ void kernelname(float *ptr)
{
   //...
}

in that array but with an offset value. In C/C++ it would be someting like this:

kernelname<<<dimGrid, dimBlock>>>(devptr+offset);

However, this doesn't seem to work.

Is there a way to do this without sending the offset value to the kernel in a separate argument and use that offset in the kernel code? Any ideas on how to do this?


Solution

  • Pointer arithmetic does work just fine in CUDA. You can add an offset to a CUDA pointer in host code and it will work correctly (remembering the offset isn't a byte offset, it is a plain word or element offset).

    EDIT: A simple working example:

    #include <cstdio>
    int main(void)
    {
    
        const int na = 5, nb = 4;
        float a[na] = { 1.2, 3.4, 5.6, 7.8, 9.0 };
        float *_a, b[nb];
    
        size_t sza = size_t(na) * sizeof(float);
        size_t szb = size_t(nb) * sizeof(float);
    
        cudaFree(0);
    
        cudaMalloc((void **)&_a, sza );
        cudaMemcpy( _a, a, sza, cudaMemcpyHostToDevice);
        cudaMemcpy( b, _a+1, szb, cudaMemcpyDeviceToHost);
    
        for(int i=0; i<nb; i++)
            printf("%d %f\n", i, b[i]);
    
        cudaThreadExit();
    }
    

    Here, you can see a word/element offset has been applied to the device pointer in the second cudaMemcpy call to start the copy from the second word, not the first.