c++reinterpret-castpointer-conversion

C++: is reinterpret_cast the best choice in these scenarios?


This has been bugging me for a very long time: how to do pointer conversion from anything to char * to dump binary to disk.

In C, you don't even think about it.

double d = 3.14;
char *cp = (char *)&d;

// do what u would do to dump to disk

However, in C++, where everyone is saying C-cast is frowned upon, I've been doing this:

double d = 3.14;
auto cp = reinterpret_cast<char *>(&d);

Now this is copied from cppreference, so I assume this is the proper way.

However, I've read from multiple sources saying this is UB. (e.g. this one) So I can't help wonder if there is any "DB" way at all (According to that post, there's none).

Another scenario I often encounter is to implement an API like this:

void serialize(void *buffer);

where you would dump a lot of things to this buffer. Now, I've been doing this:

void serialize(void *buffer) {
    int intToDump;
    float floatToDump;

    int *ip = reinterpret_cast<int *>(buffer);
    ip[0] = intToDump;

    float *fp = reinterpret_cast<float *>(&ip[1]);
    fp[0] = floatToDump;
}

Well, I guess this is UB as well.

Now, is there truly no "DB" way to accomplish either of these tasks? I've seen someone using uintptr_t to accomplish sth similar to serialize task with pointer as integer math along with sizeof, but I'm guessing here that it's UB as well.

Even though they are UB, compiler writers usually do the rational things to make sure everything is okay. And I'm okay with that: it's not an unreasonable thing to ask for.

So my questions really are, for the two common tasks mentioned above:

  1. Is there truly no "DB" way to accomplish them that will satisfy the ultimate C++ freaks?
  2. Any better way to accomplish them other than what I've been doing?

Thanks!


Solution

  • Your serialize implementation's behavior is undefined because you violate the strict aliasing rules. The strict aliasing rules say, in short, that you cannot reference any object via a pointer or reference to a different type. There is one major exception to that rule though: any object may be referenced via a pointer to char, unsigned char, or (since C++17) std::byte. Note that this exception does not apply the other way around; a char array may not be accessed via a pointer to a type other than char.

    That means that you can make your serialize function well-defined by changing it as so:

    void serialize(char* buffer) {
        int intToDump = 42;
        float floatToDump = 3.14;
    
        std::memcpy(buffer, &intToDump, sizeof(intToDump));
        std::memcpy(buffer + sizeof(intToDump), &floatToDump, sizeof(floatToDump));
    
        // Or you could do byte-by-byte manual copy loops
        // i.e.
        //for (std::size_t i = 0; i < sizeof(intToDump); ++i, ++buffer) {
        //    *buffer = reinterpret_cast<char*>(&intToDump)[i];
        //}
        //for (std::size_t i = 0; i < sizeof(floatToDump); ++i, ++buffer) {
        //    *buffer = reinterpret_cast<char*>(&floatToDump)[i];
        //}
    }
    

    Here, rather than casting buffer to a pointer to an incompatible type, std::memcpy casts a pointer to the object to serialize to a pointer to unsigned char. In doing so, the strict aliasing rules are not violated, and the program's behavior remains well-defined. Note that the exact representation is still unspecified; as it will depend on your CPU's endianess.