Why is cudaMemset not setting bytes to the given value

For some reason When calling cudaMemset and checking the value of my array I don't get the garbage values as expected.

int main(){

    float *elementsC;
    # Allocate on host
    cudaMallocHost(&elementsC, 4*sizeof(float));
    # set all bytes to 1
    cudaMemset(elementsC, 1, 4*sizeof(float));
    printf("Check the last ele of C: %f\n", elementsC[0]);
}

When compiling I get the following:

Check the last ele of C: 0.000000

Can someone explain why I get all zeros in my array?

PS: Changing the value set 1 by any other value has no impact, and I am compiling with:

nvcc -std=c++17

Solution

cudaMemset (like normal memset) is treating the buffer as an array of bytes, and fills the bytes one by one.

This is not the right API for initializing float values:

Using 1 means that the bytes of each float will be set to 0x01010101 in binary (because float is typically 4 bytes, each set to 0x01).

I assume float are represented on your system as it is usually, according to IEEE-754 standard.
As you can experience in the IEEE-754 Floating Point Converter, 0x01010101 represents the floating point value of 2.3694278e-38.
This is a value extremly close to 0 (the exponent is -38), and so printf with %f simply displays it as 0.
If you use %e instead of %f (%g will do the job as well here), you will see the scientific notation. On my system it shows 2.369428e-38.

See minimal live demo (Godbolt).

The better way to initialize an array of floats, since you allocated the data on the host, is to use a loop and assign the floats to proper values.
(If the data was in device memory you could have used cudaMemcpy to initialize it from an array of floats).

Note:
The cuda functions have a return value of type cudaError_t.
You should always check it to make sure the call succeeded (they return the value cudaSuccess in that case).

A side note:
# is not a valid prerfix for comments in C++, you should use //.

A final note:
The cudaMemset documentation mentions it is supposed to be used for setting device memory. Initially I thought your problem is because it does not support host memory, but at least on my platform this is not the case and it works with host memory as well.