I would like to understand in more detail when reinterpret_cast
is safe and when it causes undefined behaviour. Below is a sample program I put together for discussion. I understand that memcpy to an existing object is defined but as long as it is trivally_copyable. I am interested in how the reinterpret_cast is well defined or not.
#include <cstring>
#include <type_traits>
template<typename T, std::enable_if_t<std::is_trivially_constructible_v<T>>* = nullptr>
void serialize(T const& source, unsigned char* buffer) {
std::memcpy(buffer, &source, sizeof(T));
}
template<typename T, std::enable_if_t<std::is_trivially_constructible_v<T>>* = nullptr>
T deserialize(unsigned char* buffer) {
T entity;
std::memcpy(&entity, buffer, sizeof(T));
return entity;
}
template<typename T, std::enable_if_t<std::is_trivially_constructible_v<T>>* = nullptr>
T* view_as(unsigned char* buffer) {
return reinterpret_cast<T*>(buffer);
}
struct point {
int x;
int y;
};
int main() {
point p1{1,2};
point p2{3,4};
alignas(point) unsigned char buffer[2 * sizeof(point)];
// These calls should be fine as we are copying trivally copyable types to a correctly aligned buffer
serialize(p1, buffer);
serialize(p2, buffer + sizeof(point));
// I believe memcpying a buffer to an object that has its lifetime started is valid
auto p3 = deserialize<point>(buffer);
auto p4 = deserialize<point>(buffer + sizeof(point));
// As I copied these in and they were originally valid objects can I do the following 2 lines safely
auto* p5 = view_as<point>(buffer);
auto* p6 = view_as<point>(buffer + sizeof(point));
return 0;
}
I have taken some excepts from the C++17 Standard around accessing the value of an object with a pointer to a different type which are as stated here.
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:63
- (11.1) the dynamic type of the object,
- (11.2) a cv-qualified version of the dynamic type of the object,
- (11.3) a type similar (as defined in 7.5) to the dynamic type of the object,
- (11.4) a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- (11.5) a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
- (11.6) an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
- (11.7) a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- (11.8) a char, unsigned char, or std::byte type.
When a value is written through std::memcpy
as shown in the code above is it undefined behaviour to later read it with a reinterpret cast as show in the example with the view_as
function template. What is the best way to obtain optimal performance and safety if that is possible?
return reinterpret_cast<T*>(buffer);
The only issue in your example with this approach is that you're not using std::launder
. See also What is the purpose of std::launder?
If you wrote
return std::launder(reinterpret_cast<T*>(buffer));
... then you could use the returned pointer to access the T
within buffer
.
std::launder
is needed because otherwise, buffer
points to an unsigned char
, and as the rules you've cited state, you cannot access an object of type unsigned char
through a glvalue of type point
.
However, serialize
uses memcpy
to place point
s into the buffer, which implicitly creates objects, so an actual point
object exists at the address of buffer
, and std::launder
obtains a pointer to that object, not just to some byte.
Keep in mind that you cannot use this to type-pun some byte array in general.
Actual point
objects need to have been created inside first, potentially implicitly.
view_as
has no way to ensure that, or to check for that, so it's extremely unsafe.
That's why using std::memcpy
like in deserialize
is the safest and most robust option currently.
C++23 also adds std::start_lifetime_as
to begin the lifetime of point
objects in an existing byte array, but no one implements it yet.
See also What is the modern, correct way to do type punning in C++?