c++clanguage-lawyerbit-fieldstruncation

Inconsistent truncation of unsigned bitfield integer expressions between C++ and C in different compilers


Edit 2:

I was debugging a strange test failure when a function previously residing in a C++ source file but moved into a C file verbatim, started to return incorrect results. The MVE below allows to reproduce the problem with GCC. However, when I, on a whim, compiled the example with Clang (and later with VS), I got a different result! I cannot figure out whether to treat this as a bug in one of the compilers, or as manifestation of undefined result allowed by C or C++ standard. Strangely, none of the compilers gave me any warnings about the expression.

The culprit is this expression:

ctl.b.p52 << 12;

Here, p52 is typed as uint64_t; it is also a part of a union (see control_t below). The shift operation does not lose any data as the result still fits into 64 bits. However, then GCC decides to truncate the result to 52 bits if I use C compiler! With C++ compiler, all 64 bits of result are preserved.

To illustrate this, the example program below compiles two functions with identical bodies, and then compares their results. c_behavior() is placed in a C source file and cpp_behavior() in a C++ file, and main() does the comparison.

Repository with the example code: https://github.com/atakua/c-cpp-bitfields

Header common.h defines a union of 64-bit wide bitfields and integer and declares two functions:

#ifndef COMMON_H
#define COMMON_H

#include <stdint.h>

typedef union control {
        uint64_t q;
        struct {
                uint64_t a: 1;
                uint64_t b: 1;
                uint64_t c: 1;
                uint64_t d: 1;
                uint64_t e: 1;
                uint64_t f: 1;
                uint64_t g: 4;
                uint64_t h: 1;
                uint64_t i: 1;
                uint64_t p52: 52;
        } b;
} control_t;

#ifdef __cplusplus
extern "C" {
#endif

uint64_t cpp_behavior(control_t ctl);
uint64_t c_behavior(control_t ctl);

#ifdef __cplusplus
}
#endif

#endif // COMMON_H

The functions have identical bodies, except that one is treated as C and another as C++.

c-part.c:

#include <stdint.h>
#include "common.h"
uint64_t c_behavior(control_t ctl) {
    return ctl.b.p52 << 12;
}

cpp-part.cpp:

#include <stdint.h>
#include "common.h"
uint64_t cpp_behavior(control_t ctl) {
    return ctl.b.p52 << 12;
}

main.c:

#include <stdio.h>
#include "common.h"

int main() {
    control_t ctl;
    ctl.q = 0xfffffffd80236000ull;

    uint64_t c_res = c_behavior(ctl);
    uint64_t cpp_res = cpp_behavior(ctl);
    const char *announce = c_res == cpp_res? "C == C++" : "OMG C != C++";
    printf("%s\n", announce);

    return c_res == cpp_res? 0: 1;
}

GCC shows the difference between the results they return:

$ gcc -Wpedantic main.c c-part.c cpp-part.cpp

$ ./a.exe
OMG C != C++

However, with Clang C and C++ behave identically and as expected:

$ clang -Wpedantic main.c c-part.c cpp-part.cpp

$ ./a.exe
C == C++

With Visual Studio I get the same result as with Clang:

C:\Users\user\Documents>cl main.c c-part.c cpp-part.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24234.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.c
c-part.c
Generating Code...
Compiling...
cpp-part.cpp
Generating Code...
Microsoft (R) Incremental Linker Version 14.00.24234.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj
c-part.obj
cpp-part.obj

C:\Users\user\Documents>main.exe
C == C++

I tried the examples on Windows, even though the original problem with GCC was discovered on Linux.


Solution

  • C and C++ treat the types of bit-field members differently.

    C 2018 6.7.2.1 10 says:

    A bit-field is interpreted as having a signed or unsigned integer type consisting of the specified number of bits…

    Observe this is not specific about the type—it is some integer type—and it does not say the type is the type that was used to declare the bit-field, as in the uint64_t a : 1; shown in the question. This apparently leaves it open to the implementation to choose the type.

    C++ 2017 draft n4659 12.2.4 [class.bit] 1 says, of a bit-field declaration:

    … The bit-field attribute is not part of the type of the class member…

    This implies that, in a declaration such as uint64_t a : 1;, the : 1 is not part of the type of the class member a, so the type is as if it were uint64_t a;, and thus the type of a is uint64_t.

    So it appears GCC treats a bit-field in C as some integer type 32-bits or narrower if it fits and a bit-field in C++ as its declared type, and this does not appear to violate the standards.