c++structunionsraw-data

Best way to interpret a byte array as a struct in C++


What is the most efficient and most elegant way to interpret a string of bytes in modern C++? My first naive attempt was to use a bit field. Here is an example that hopefully explains the purpose and the difficulty of the task:

union Data {
    uint8_t raw[2];
    struct __attribute__((packed)) {
        uint field1: 4, field2: 2, field3: 1, field4: 2;
        uint field5: 7;
    } interpreted;
};


int main() {
    static_assert(sizeof(Data) == 2);
    Data d{.raw{0x84, 0x01}};
    std::cout << d.interpreted.field1 << std::endl;
    std::cout << d.interpreted.field4 << std::endl;
    std::cout << d.interpreted.field5 << std::endl;
}

This approach is computationally efficient, but it is not portable, and the order of the fields in memory is difficult to predict.

Output on i386/gcc11:

4
3
0

The 4 from 0x84 ended up in field1, while field5 uses the least significant bit in 0x01. Is there a better way? Perhaps a solution that sacrifices some of processing efficiency for maintainability and portability?


Solution

  • One problem is that union type punning is UB though some compilers may allow it. Another problem is that the way bit fields are structured is not UB but is implementation defined. That said, most compilers pack bit fields in the low part first and allow spanning. It's just not guaranteed but it should be defined by the compiler spec.

    One way to safely and efficiently do this is with a separate function that returns a Data object using std::bit_cast and a test initially executed at runtime that checks the implementation and fails, perhaps by throwing an exception.

    #include <cstdint>
    #include <iostream>
    #include <bit>
    
    // 0000000'11'0'00'0100 { 0x84, 0x01 };
    struct Data {
        uint16_t field1 : 4, field2 : 2, field3 : 1, field4 : 2;
        uint16_t field5 : 7;
    };
    
    Data to_Data(uint8_t(&a)[2]) {
        return std::bit_cast<Data>(a);
    }
    
    // returns true if imnplimentation is OK
    // fails to compile if size(Data)!=2
    bool test_Data_implimentation()
    {
        uint8_t a[2]{ 0x84, 0x01 };
        Data d = std::bit_cast<Data>(a);
        return d.field1 == 4 && d.field4 == 3 && d.field5 == 0;
    }
    
    int main() {
        if (test_Data_implimentation())
            std::cout << "Implementation passes\n";
        else
            std::cout << "Implementation fails\n";
        uint8_t a[2]{ 0x84, 0x01 };
        Data d = to_Data(a);
        std::cout << d.field1 << std::endl;
        std::cout << d.field4 << std::endl;
        std::cout << d.field5 << std::endl;
        //4
        //3
        //0
    }
    

    I also made a constexpr, self executing lambda, that doesn't take up any runtime code by checking at compile time if bit fields are packed as this, while very common, is implementation defined. The advantage, aside from a compile time check, is that it doesn't add anything to the global (or local) namespace. Adding this to any function that is compiled will check the bit field implementation and little endian state of the compiler. I actually did this because it would up simplifying some decoding of ICC (international color consortium) profile structures that are defined as binary objects.

    []() {
        constexpr uint16_t i = 0b0000'0001'0000'1101;
        struct A {uint16_t i0 : 2, i1 : 3, i2 : 4, i3 : 7; };
        constexpr A a{ std::bit_cast<A>(i) };
        static_assert(a.i0 == 1 && a.i1 == 3 && a.i2 == 8 && a.i3 == 0);
    }();
    

    Quick note: Clang hasn't yet implemented constexpr bit_cast for bit fields. It's an outstanding bug. MSVC and GCC have. For those using MSVC, intelliense, which uses Clang, puts red squigles in some of the code but it still compiles just fine with MSVC.