c++language-lawyerpointer-arithmetic

Would P1839 make it possible to access subobjects from offsets into object representations?


This is in one sense an extension to Is it UB to access a subobject by adding byte offset to the address of the enclosing object? under the assumption that P1839 is adopted.

Consider the following code:

struct V2
{
    float x;
    float y;

    float& f()
    {
        // If significant, substitute char* or unsigned char* for std::byte
        std::byte* self = reinterpret_cast<std::byte*>(this); // (1)
        std::byte* rep_y = self + offsetof(V2, y); // (2)
        float* ptr_y = reinterpret_cast<float*>(rep_y); // (3)
        return *ptr_y; // (4)
    }
};

Previous discussions on whether code such as this is valid have centred around the fallout of inspecting object representations post-C++17, where the consensus appears to be that formally, the pointer arithmetic in (2) is undefined behaviour due to how object representations are specified.

Fixing iteration over the object representation appears to be the primary motivation of P1839, such that (2) is no longer UB. This necessarily requires introducing wording around nesting and subobjects, and it's at this point that I find I'm not well-versed enough in the standard/its terminology to conclude whether f as implemented above is valid.

I'm aware that P1839 includes the following footnote, but this appears to be talking about the cases in which offsets are applied outside of current reachability constraints, which is not the case here?

These reachability-based restrictions limit compatibility between C and C++, in particular when it comes to C code that uses offsetof to implement intrusive data structures. A separate paper is being prepared that proposes to remove these restrictions. Additional specification difficulties are raised by such a direction, which will not be discussed here.

Firstly, in the case of no offset being applied, am I correct in saying that it is the pointer-interconvertibility point "one is the object representation of the other, or the first element thereof" that allows for the reinterpreting casting between a T* of this, its object representation as (say) std::byte*, and then back to T* (with value accessibility)?

Hence, or otherwise, can the pointer rep_y as obtained via (2) be interpreted as the first element of the object representation of y, and therefore permit access to the value of y through the suitably casted pointer (3)? Based on the above, this would be the case if initially obtained via &y, but does this still hold true when formed from the object representation of V2? Why/why not?

If it is not technically the (first element of the) object representation of y, or if there is some other key point I have missed, do the modifications to the definition of std::launder imply that a subsequent laundering would provide the correct result? Loosely, it appears to point to a float object reachable from the original pointer, so would this work? Is there any complication due to the fact that we've constructed a pointer into (part of) an object representation, rather than a pointer to the intended object (or sub-object)?

Lastly, if none of this holds, is there any code similar to this that either is valid, or is intended to be permitted (in a similar spirit to P1839's object representation inspection loop)?


Solution

  • I'm aware that P1839 includes the following footnote, but this appears to be talking about the cases in which offsets are applied outside of current reachability constraints, which is not the case here?

    No, the reachability condition is the precondition stated on std::launder which basically guarantees that std::launder can't be used to get access to bytes of memory that wouldn't otherwise be accessible by reinterpret_cast, pointer arithmetic and member access.

    In your case all bytes of the V2 object are reachable through *this, including the y subobject. This extends to the object representation of *this as proposed.

    A problem would be the other way around, as typically used by container_of macros in C: Suppose you have a pointer to this->y and you attempt to use the same approach to get back this. Only the bytes of this->y are reachable through &(this->y). Because of a special exception for first members of standard-layout classes the same doesn't apply to this->x though.

    The footnote explains that the paper does not intent to circumvent these reachability conditions which currently make a container_of macro in C++ impossible.

    Firstly, in the case of no offset being applied, am I correct in saying that it is the pointer-interconvertibility point "one is the object representation of the other, or the first element thereof" that allows for the reinterpreting casting between a T* of this, its object representation as (say) std::byte*, and then back to T* (with value accessibility)?

    Not really. If you read the proposed changes to [expt.static.cast], pointer-interconvertibility is not the only condition anymore. The proposed rules are much more complex. In case of the cast from T* to std::byte* usually the second-to-last item would apply:

    Otherwise, if T is cv std::byte or cv array of std::byte, let U be the type obtained from T by replacing std::byte with unsigned char. If a static_cast of the operand to U* would be well-formed and would yield a pointer to an object representation or element thereof, the result of the cast to T* is that pointer value.

    which then recurses back to the item

    Otherwise, if a’s object representation is an array A and T is cv unsigned char, the result is a pointer to the first element of a’s object representation.

    to imply that you get a pointer to the first element of the object representation. That is not true in all cases though, for example if T is unsigned char or std::byte itself or if there is a pointer-interconvertible object of type std::byte or unsigned char or similar type which is not part of a synthesized object representation. None of these special cases apply to your example.

    The cast back from the object representation would then (again with similar exceptions) have the item

    Otherwise, the result is a member of S whose complete object is not a synthesized object representation if any such result would give the program defined behavior. If there are multiple possible results that would give the program defined behavior, the result is an unspecified choice among them.

    apply so that you get back the original pointer value.

    Hence, or otherwise, can the pointer rep_y as obtained via (2) be interpreted as the first element of the object representation of y, and therefore permit access to the value of y through the suitably casted pointer (3)?

    No, rep_y is a pointer to an element of the synthesized object representation of the V2 object, not the object representation of the y subobject. The cast to float* will not result in a pointer to the y subobject. That subobject is still not pointer-interconvertible with the elements of the object representation of the V2 object, nor is the object representation of the y subobject pointer-interconvertible with the elements f the V2 object's object representation (although the former is nested within the latter).

    Therefore only the last item of the proposed changes to [expr.static.cast] applies:

    Otherwise, the result is a pointer to a.

    As a consequence your code requires an extra std::launder call to have defined behavior if the result of f is actually accessed or bound to a reference:

    float* ptr_y = std::launder(reinterpret_cast<float*>(rep_y)); // (3)