clanguage-lawyerundefined-behaviorc11bit-fields

Is this a well-defined way to access bitfield bits by index in C11


The order of bits in C11 bitfields is implementation-defined, and so is any padding in between them. This makes it difficult to access the bitfields by index at runtime, even though most implementations order the bitfields in a straightforward order.

This code attempts to avoid any undefined behavior by using a constant array containing the bit positions, and by using memcpy to access individual bytes of the bitfield:

// Generic type for up to 8 bitfields
typedef struct {
    bool b0: 1;
    bool b1: 1;
    bool b2: 1;
    bool b3: 1;
    bool b4: 1;
    bool b5: 1;
    bool b6: 1;
    bool b7: 1;
} bitfield8_t;

// Order of bits in a bitfield is implementation-defined
// This constant array makes the code portable
static const bitfield8_t bitorder[8] =
{
    {1,0,0,0,0,0,0,0},
    {0,1,0,0,0,0,0,0},
    {0,0,1,0,0,0,0,0},
    {0,0,0,1,0,0,0,0},
    {0,0,0,0,1,0,0,0},
    {0,0,0,0,0,1,0,0},
    {0,0,0,0,0,0,1,0},
    {0,0,0,0,0,0,0,1},
};

// Set bit in bitfield by index
// bitpos must be in range 0..7
void set_bit(bitfield8_t *x, int bitpos)
{
    // Reserve temporary storage for individual bytes
    // Normally it's just one byte, but compiler is
    // allowed to add padding.
    uint8_t x_bytes[sizeof(bitfield8_t)];
    uint8_t s_bytes[sizeof(bitfield8_t)];
    
    // Get the bit that corresponds to the index
    bitfield8_t s = bitorder[bitpos];
    
    // Copy fields to bytes so that we can modify
    memcpy(x_bytes, x, sizeof(bitfield8_t));
    memcpy(s_bytes, &s, sizeof(bitfield8_t));
    
    // Bitwise-OR each byte (typically just one)
    for (int i = 0; i < sizeof(bitfield8_t); i++)
    {
        x_bytes[i] |= s_bytes[i];
    }
    
    // Copy result back to bitfield
    memcpy(x, x_bytes, sizeof(bitfield8_t));
}

Then in the actual usage, the bitfield bits would have different names and possibly not all 8 are included:

// Actual code would have different names for
// the bitfield bits in a larger structure.
typedef struct {
    int some_field;
    uint16_t some_other_field;

    union {
        bitfield8_t bitfield_raw;
        struct {
            bool fancy_flag: 1;
            bool silly_flag: 1;
        };
    };

    uint8_t more_fields;
} myfancystruct_t;

int main()
{
    myfancystruct_t mfs = {};

    set_bit(&mfs.bitfield_raw, 1);

    printf("fancy_flag: %d\n", mfs.fancy_flag); // Prints 0
    printf("silly_flag: %d\n", mfs.silly_flag); // Prints 1

    return 0;
}

This optimizes nicely enough on most architectures: (View on Compiler Explorer)

set_bit:
        movsx   rsi, esi
        mov     al, BYTE PTR bitorder[rsi]
        or      BYTE PTR [rdi], al
        ret

But is this well-defined behavior by C11 standard?

My interpretation is that:

  1. C11 section 6.5.2.3 clause 6 allows "common initial sequence" in union, including bit-fields, which makes the bitfield_raw access ok.
  2. Section 6.2.6.1 clause 3 says that "unsigned bit-fields .. shall be represented using a pure binary notation", which makes the bitwise-or work for setting the bitfield. It might also guarantee that sìzeof(bitfield8_t) == 1, but that is not so important.
  3. Section 6.2.6.1 clause 4 allows memcpy from struct to bytes and back. It excludes bit-fields, but I think structs containing bit-fields is fine.

I am aware that there are many ways to implement bit flags in C, and I'm still just considering my options. To make this question answerable, I would like to focus it on whether the shown approach is well-defined. Any answer that points out the part of standard that makes this undefined behavior is greatly appreciated!


Solution

  • The order of bits in C11 bitfields is implementation-defined, and so is any padding in between them.

    There are specified, unspecified, and implementation-defined aspects to that, but it's more unspecified than anything else. Still, I guess your point is that different implementations behave differently, which is true.

    This makes it difficult to access the bitfields by index at runtime, even though most implementations order the bitfields in a straightforward order.

    All implementations order bitfields in a straightforward manner. But that's not very relevant, because doing anything that depends on the storage order of bitfields is bad news. In fact, most other uses of bitfields are bad news, too. Generally speaking, people who want to set up a bit array with elements accessed by index don't use bitfields to do it. They use a flat integer, with bits read or written via shifting and masking:

    void set_bit(uint8_t *x, unsigned bitpos) {
        *x = *x | (1u << bitpos);
    }
    
    void clear_bit(uint8_t *x, unsigned bitpos) {
        *x = *x & ~(1u << bitpos);
    }
    
    unsigned get_bit(uint8_t x, unsigned bitpos) {
        return (x >> bitpos) & 1;
    }
    

    But is this well-defined behavior by C11 standard?

    My interpretation is that:

    • C11 section 6.5.2.3 clause 6 allows "common initial sequence" in union, including bit-fields, which makes the bitfield_raw access ok.

    As shown in your example, yes. The common initial sequence of two or more structures can include or consist entirely of bitfields. They can then be accessed via the union.

    • Section 6.2.6.1 clause 3 says that "unsigned bit-fields .. shall be represented using a pure binary notation", which makes the bitwise-or work for setting the bitfield. [...]

    I don't think you need to rely on that for bitfields of width 1, but OK.

    • Section 6.2.6.1 clause 4 allows memcpy from struct to bytes and back. It excludes bit-fields, but I think structs containing bit-fields is fine.

    6.2.6.1/4 is not an enabling provision for that. It is about defining the term object representation. Any object that is not declared with register storage class can be copied via memcpy, subject to the specifications of that function. Bitfields are not objects, but structures and unions containing them are, so yes, you can use memcpy to copy those.

    So yes, the code presented has well defined behavior.

    But I urge you not to do that. It is much larger and more complicated than the typical alternative. It goes out of its way to use bitfields at all, which is exactly the wrong direction. The best solution to bitfield details being unspecified is to avoid using bitfields. Especially so when to do otherwise requires building a complex framework and imposing additional structure on top of them.