c++unionsbit-fields

Defined behaviour for union with 24-bit and 8-bit vars


I'm trying to find the best way to pack a 24-bit and 8-bit unsigned integer together into 32 bits without requiring bit-shifts to extract data. Unions immediately came to mind with a simple approach looking like this:

union {
    uint32_t u24;
    uint8_t u8[4]; // use only u8[3]
}

However, this approach results in undefined behavior based on a system's endianness, so I came up with the following approach that uses a c++20 feature to detect a system's endianness at compile time using std::endian and constexpr:

#include <bit>
struct UnionTest {
    union {
        uint32_t u24;
        uint8_t u8[4];
    };
    
    inline constexpr uint8_t get_u8_index() const noexcept {
        if constexpr (std::endian::native == std::endian::little) return 0;
        else if constexpr (std::endian::native == std::endian::big) return 3;
        else // crap the bed
    }
};

// use like this:
int main() {
    UnionTest test;
    test.u24 = 0xffffff;
    test.u8[test.get_u8_index()] = 0xff;
}

This may be a bit verbose still, but that's not the issue. I am purely interested in the viability of this approach, assuming we never write values larger than 24 bits into u24.

A different way would be to use bit-fields:

struct UnionTest {
    uint32_t u24 : 24;
    uint32_t u8 : 8;
}

But this may result in 64 bits rather than 32 (even though it should be expected to be 32 in most cases).

My question would be A) about the feasibility of the union approach regarding performance and potentially undefined behavior and B) the actual difference between the proposed union approach and the usage of c++ bit-fields


Solution

  • C++ language allows to access the byte representation on any object. It is explicitely used to allow byte copy of trivially copyable types. Furthermore, if endianness is defined, you can expect a 24 bits value to use the 3 high order bytes for little endian and the 3 low order bytes for big endian. A mask it still required to access the 24 bits value, but the 8 bits one can be accessed directly, and no shift is ever used.

    Here is an possible code demonstrating that:

    #include <iostream>
    #include <bit>
    
    namespace {
        inline constexpr uint8_t get_u8_index() noexcept {
            if constexpr (std::endian::native == std::endian::little) return 3;
            else if constexpr (std::endian::native == std::endian::big) return 0;
            else {}// crap the bed
        }
    }
    
    class pack_24_8 {
        uint32_t value;
    
        static const int u8_index = get_u8_index();  // locally scoped constant
    
    public:
        uint8_t get_u8() const {
            return ((const uint8_t*)(&value))[u8_index]; // extract one single byte
        }
    
        void set_u8(uint8_t c) {
            ((uint8_t*)(&value))[u8_index] = c;  // set one single byte
        }
    
        uint32_t get_u24() const {
            return value & 0xffffff;      // get the less significant 24 bits
        }
    
        void set_u24(uint32_t u24) {
            uint8_t u8 = get_u8();    // save the u8 part
            value = u24;
            set_u8(u8);               // and restore it
        }
    };
    
    // use like this:
    int main() {
        pack_24_8 test;
        test.set_u8(0x5a);
        test.set_u24(0xa5a5a5);
    
        std::cout << std::hex << (unsigned int) test.get_u8() << " - " <<
            std::hex << test.get_u24() << '\n';
    
        return 0;
    }
    

    Beware: as said by @Caleth in comment, this relies on uint8_t being an alias for unsigned char. AFAIK this is true for every common architecture, but it is not required per standard...