For some reason When calling cudaMemset
and checking the value of my array I don't get the garbage values as expected.
int main(){
float *elementsC;
# Allocate on host
cudaMallocHost(&elementsC, 4*sizeof(float));
# set all bytes to 1
cudaMemset(elementsC, 1, 4*sizeof(float));
printf("Check the last ele of C: %f\n", elementsC[0]);
}
When compiling I get the following:
Check the last ele of C: 0.000000
Can someone explain why I get all zeros in my array?
PS: Changing the value set 1 by any other value has no impact, and I am compiling with:
nvcc -std=c++17
cudaMemset
(like normal memset
) is treating the buffer as an array of bytes, and fills the bytes one by one.
This is not the right API for initializing float
values:
Using 1
means that the bytes of each float
will be set to 0x01010101
in binary (because float
is typically 4 bytes, each set to 0x01
).
I assume float
are represented on your system as it is usually, according to IEEE-754 standard.
As you can experience in the IEEE-754 Floating Point Converter, 0x01010101
represents the floating point value of 2.3694278e-38
.
This is a value extremly close to 0
(the exponent is -38
), and so printf
with %f
simply displays it as 0
.
If you use %e
instead of %f
(%g
will do the job as well here), you will see the scientific notation. On my system it shows 2.369428e-38
.
See minimal live demo (Godbolt).
The better way to initialize an array of float
s, since you allocated the data on the host, is to use a loop and assign the float
s to proper values.
(If the data was in device memory you could have used cudaMemcpy
to initialize it from an array of float
s).
Note:
The cuda functions have a return value of type cudaError_t
.
You should always check it to make sure the call succeeded (they return the value cudaSuccess
in that case).
A side note:
#
is not a valid prerfix for comments in C++, you should use //
.
A final note:
The cudaMemset
documentation mentions it is supposed to be used for setting device memory. Initially I thought your problem is because it does not support host memory, but at least on my platform this is not the case and it works with host memory as well.