I am using a global variable say d_myVar
, which will be allocated device memory using cudaMalloc()
in main function. I am not clear, should I use __ device__
in front of it while doing global declaration? I ask this, since if it were local variable in host and was passed to a kernel, we would not write __device__
in front of it. Let me know if I am wrong.
Globally-scoped __device__
variables are not allocated with cudaMalloc
. Simply annotate a variable in the global scope with __device__
:
#include <stdio.h>
__device__ int d_myVar;
__global__ void foo()
{
printf("d_myVar is %d\n", d_myVar);
}
int main()
{
int h_myVar = 13;
cudaMemcpyToSymbol(d_myVar, &h_myVar, sizeof(int), 0, cudaMemcpyHostToDevice);
foo<<<1,1>>>();
cudaThreadSynchronize();
return 0;
}
The result:
$ nvcc -arch=sm_20 test.cu -run
d_myVar is 13