I am designing a library that has a large contingent of CUDA kernels to perform parallel computations. All the kernels will be acting on a common object, say a computational grid, which is defined using C++ style objects. The computational domain doesn't necessarily need to be accessed from the host side, so creating it on the device side and keeping it there makes sense for now. I'm wondering if the following is considered "good practice":
Suppose my computational grid class is called Domain
. First I Define a global device-side variable to store the computational domain:
__device__ Domain* D
Then I Initialize the computational domain using a CUDA kernel
__global__ void initDomain(paramType P){
D = new Domain(P);
}
Then, I perform computations using this domain with other kernels:
__global__ void doComputation(double *x,double *y){
D->doThing(x,y);
//...
}
If my domain remains fixed (i.e. kernels don't modify the domain once it's created), is this OK? Is there a better way? I initially tried creating the Domain
object on the host side and copying it over to the device, but this turned out to be a hassle because Domain
is a relatively complex type that makes it a pain to copy over using e.g. cudaMemCpy
or even Thrust::device_new
(at least, I couldn't get it to work nicely).
Yes it's ok.
Maybe you can improve performance using
__constant__
using this keyword, your object will be available in all your kernels in a very fast memory.
In order to copy your object, you must use : cudaMemcpyToSymbol, please note there is come restriction : your object will be read-only in your device code, and it must don't have default constructor.
You can find informations here
If your object is complex and hard to copy, maybe you can look for : Unified memory, then just pass your variable by value to your kernel.