c++structc++20abimemory-layout

What is correct mental model for [[no_unique_address]] in C++?


I recently found out about [[no_unique_address]] attribute in C++. According to cppreference.com:

Applies to the name being declared in the declaration of a non-static data member that is not a bit-field. Makes this member subobject potentially-overlapping, i.e., allows this member to be overlapped with other non-static data members or base class subobjects of its class. This means that if the member has an empty class type (e.g. stateless allocator), the compiler may optimize it to occupy no space, just like if it were an empty base. If the member is not empty, any tail padding in it may be also reused to store other data members.

I think I understand the case for empty class types - at least I understand the examples provided. But this part:

If the member is not empty, any tail padding in it may be also reused to store other data members.

is what confuses me.

As far as I understand, tail padding - memory space at the end of an object that is added to satisfy its alignment requirements and is not used by any of the objects subobjects. For example:

struct alignas(8) Test {
    char x;
    bool y;
};

Here, since sizeof(char) == 1 and sizeof(bool) == 1, the size required for Test instance is 2 bytes, which can be confirmed by removing alignas(8) requirement. Therefore, introduction of this alignment requirement should force compiler to add 6 bytes of tail padding to the Test structure. But, for some reason, compiler is not allowed to utilize those bytes for overlapped storage unless at least one member of Test is annotated with [[no_unique_address]]. For example:

struct alignas(8) Test {
    char x;
    bool y;
};

struct Combine {
    [[no_unique_address]] Test test;
    bool flag;
};

int main() {
    return sizeof(Combine);
}

returns 16, while this one:

struct alignas(8) Test {
    [[no_unique_address]] char x;  // <-- Marked `[[no_unique_address]]` now.
    bool y;
};

struct Combine {
    [[no_unique_address]] Test test;
    bool flag;
};

int main() {
    return sizeof(Combine);
}

returns 8. This confuses me. In my understanding, when Combine.test is annotated with [[no_unique_address]], since it is not empty, its tail padding should be allowed to store other data members, as the cited explanation states. But that only seems to happen if at least one of the members of Test struct is also annotated with [[no_unique_address]]. Why is that? What is correct mental model for what [[no_unique_address]] is actually doing?


Solution

  • GCC and Clang layout structs according to the Itanium C++ ABI.

    It defines a concept called POD for the purpose of layout.

    Essentially, if a class could have been a POD (standard layout) in C++98, it will be layed out as if it was in C++98, where tail-padding reuse was not allowed. (alignas(8) is the C++11 spelling of __attribute__((__aligned__(8))), so is available in C++98). This is for backwards compatibility (and compatibility with C), so even though newer standards allow the tail padding to be reused on standard layout types, GCC won't do it.

    When any member is marked [[no_unique_address]], it is no longer POD for the purpose of layout because it has potentially-overlapping non-static data members. So tail padding can now be reused.

    The fact that Clang doesn't do it properly is a clang bug: https://github.com/llvm/llvm-project/issues/50766