Let's say I have an application that keeps receiving the byte stream from the socket. I have the documentation that describes what the packet looks like. For example, the total header size, and total payload size, with the data type corresponding to different byte offsets. I want to parse it as a struct
. The approach I can think of is that I will declare a struct
and disable the padding by using some compiler macro, probably something like:
struct Payload
{
char field1;
uint32 field2;
uint32 field3;
char field5;
} __attribute__((packed));
and then I can declare a buffer and memcpy
the bytes to the buffer and reinterpret_cast
it to my structure. Another way I can think of is that process the bytes one by one and fill the data into the struct
. I think either one should work but it is kind of old school and probably not safe.
The reinterpret_cast approach mentioned, should be something like:
void receive(const char*data, std::size_t data_size)
{
if(data_size == sizeof(payload)
{
const Payload* payload = reinterpret_cast<const Payload*>(data);
// ... further processing ...
}
}
I'm wondering are there any better approaches (more modern C++ style? more elegant?) for this kind of use case? I feel like using metaprogramming should help but I don't have an idea how to use it.
Can anyone share some thoughts? Or Point me to some related references or resources or even relevant open source code so that I can have a look and learn more about how to solve this kind of problem in a more elegant way.
There are many different ways of approaching this. Here's one:
Keeping in mind that reading a struct from a network stream is semantically the same thing as reading a single value, the operation should look the same in either case.
Note that from what you posted, I am inferring that you will not be dealing with types with non-trivial default constructors. If that were the case, I would approach things a bit differently.
In this approach, we:
read_into(src&, dst&)
function that takes in a source of raw bytes, as well as an object to populate.read_into()
on each field in the order expected on the wire.#include <cstdint>
#include <bit>
#include <concepts>
#include <array>
#include <algorithm>
// Use std::byteswap when available. In the meantime, just lift the implementation from
// https://en.cppreference.com/w/cpp/numeric/byteswap
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
static_assert(std::has_unique_object_representations_v<T>, "T may not have padding bits");
auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
std::ranges::reverse(value_representation);
return std::bit_cast<T>(value_representation);
}
template<typename T>
concept DataSource = requires(T& x, char* dst, std::size_t size ) {
{x.read(dst, size)};
};
// General read implementation for all arithmetic types
template<std::endian network_order = std::endian::big>
void read_into(DataSource auto& src, std::integral auto& dst) {
src.read(reinterpret_cast<char*>(&dst), sizeof(dst));
if constexpr (sizeof(dst) > 1 && std::endian::native != network_order) {
dst = byteswap(dst);
}
}
struct Payload
{
char field1;
std::uint32_t field2;
std::uint32_t field3;
char field5;
};
// Read implementation specific to Payload
void read_into(DataSource auto& src, Payload& dst) {
read_into(src, dst.field1);
read_into<std::endian::little>(src, dst.field2);
read_into(src, dst.field3);
read_into(src, dst.field5);
}
// mind you, nothing stops you from just reading directly into the struct, but beware of endianness issues:
// struct Payload
// {
// char field1;
// std::uint32_t field2;
// std::uint32_t field3;
// char field5;
// } __attribute__((packed));
// void read_into(DataSource auto& src, Payload& dst) {
// src.read(reinterpret_cast<char*>(&dst), sizeof(Payload));
// }
// Example
struct some_data_source {
std::size_t read(char*, std::size_t size);
};
void foo() {
some_data_source data;
Payload p;
read_into(data, p);
}
An alternative API could have been dst.field2 = read<std::uint32_t>(src)
, which has the drawback of requiring to be explicit about the type, but is more appropriate if you have to deal with non-trivial constructors.
see it in action on godbolt: https://gcc.godbolt.org/z/77rvYE1qn