My coworker wants to send some data represented by a type T
over a network. He does this The traditional way™ by casting the T
to char*
and sending it using a write(2)
call with a socket:
auto send_some_t(int sock, T const* p) -> void
{
auto buffer = reinterpret_cast<char const*>(p);
write(sock, buffer, sizeof(T));
}
So far, so good. This simplified example, apart from being stripped of any error-checking, should be correct. Assuming the type T
is trivially copyable we can copy values of this type between objects using std::mempcy()
(according to 6.7 [basic.types]
point 3 in C++17 standard[1]) so I guess write(2)
should also work as it blindly copies binary data.
Where it gets tricky is on the receiving side.
Assume the type T
in question looks like this:
struct T {
uint64_t foo;
uint8_t bar;
uint16_t baz;
};
It has a field with an alignment requirement of 8 bytes (foo
) so the whole type requires a strict alignment of 8 bytes (see example for 6.6.5 [basic.align]
point 2). This means that storage for values of type T
must be allocated only on suitable addresses.
Now, what about the following code?
auto receive_some_t(int sock, T* p) -> void
{
read(sock, p, sizeof(T));
}
// ...
T value;
receive_some_t(sock, &T);
Looks shady, but should work OK. The bytes received do represent a valid value of type T
and are blindly copied into a valid object of type T
.
However, what about using raw char
buffers like in the following code:
char buffer[sizeof(T)];
read_some_t(sock, buffer);
T* value = reinterpret_cast<T*>(buffer);
This is where my coder-brain triggers a red alert. We have absolutely no guarantee that the alignment of char[sizeof(T)]
matches that of T
which is a problem. We also do not round-trip a pointer to a valid T
object because there wasn't a valid object of type T
in our memory. And we don't know what compiler and options were used on the other side (maybe the struct on the other side is packed while ours is not).
In short, I see some potential problems with just casting raw char
buffers into other types and would try to avoid writing code such as above. But apparently it works and is how "everybody does it".
My question is: is recovering structs sent over a network and received into a char
buffer of appropriate size legal according to C++17 standard?
If not, what about using std::aligned_storage<sizeof(T), alignof(T)>
to receive such structs? And if std::aligned_storage
is not legal either, is there any legal way of sending raw structs over a network, or is it a bad idea that just happens to work... until it doesn't?
I view structs as a way of representing data types and treat the way the compiler lays them out in memory is an implementation detail and not as a wire format for data exchange to be relied upon, but I am open to being wrong.
[1] www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf
The dicey part is not so much the memory alignment, but rather the lifetime of the T
object. When you reinterpret_cast<>
memory as a T
pointer, that does not create an instance of the object, and using it as if it was would lead to Undefined Behavior.
In C++, all objects have to come into being and stop existing, thus defining their lifetime. That even applies to basic data types like int
and float
. The only exception to this is char
.
In other words, what's legal is to copy the bytes from the buffer into an already existing object, like so:
char buffer[sizeof(T)];
// fill the buffer...
T value;
std::memcpy(&value, buffer, sizeof(T));
Don't worry about performance. The compiler will optimize all that away.