c++cuda

Why is cudaMemset not setting bytes to the given value


For some reason When calling cudaMemset and checking the value of my array I don't get the garbage values as expected.

int main(){

    float *elementsC;
    # Allocate on host
    cudaMallocHost(&elementsC, 4*sizeof(float));
    # set all bytes to 1
    cudaMemset(elementsC, 1, 4*sizeof(float));
    printf("Check the last ele of C: %f\n", elementsC[0]);
}

When compiling I get the following:

Check the last ele of C: 0.000000

Can someone explain why I get all zeros in my array?

PS: Changing the value set 1 by any other value has no impact, and I am compiling with:

nvcc -std=c++17

Solution

  • cudaMemset (like normal memset) is treating the buffer as an array of bytes, and fills the bytes one by one.

    This is not the right API for initializing float values:

    Using 1 means that the bytes of each float will be set to 0x01010101 in binary (because float is typically 4 bytes, each set to 0x01).

    I assume float are represented on your system as it is usually, according to IEEE-754 standard.
    As you can experience in the IEEE-754 Floating Point Converter, 0x01010101 represents the floating point value of 2.3694278e-38.
    This is a value extremly close to 0 (the exponent is -38), and so printf with %f simply displays it as 0.
    If you use %e instead of %f (%g will do the job as well here), you will see the scientific notation. On my system it shows 2.369428e-38.

    See minimal live demo (Godbolt).

    The better way to initialize an array of floats, since you allocated the data on the host, is to use a loop and assign the floats to proper values.
    (If the data was in device memory you could have used cudaMemcpy to initialize it from an array of floats).

    Note:
    The cuda functions have a return value of type cudaError_t.
    You should always check it to make sure the call succeeded (they return the value cudaSuccess in that case).

    A side note:
    # is not a valid prerfix for comments in C++, you should use //.

    A final note:
    The cudaMemset documentation mentions it is supposed to be used for setting device memory. Initially I thought your problem is because it does not support host memory, but at least on my platform this is not the case and it works with host memory as well.