c++language-lawyerc++20undefined-behaviorreinterpret-cast

When is reinterpret_cast UB?


Does the code below contain undefined behavior (UB)?

struct A
{
    int x;
    char y;
};

int main()
{
    std::vector<uint8_t> v(sizeof(A), 0);

    A* p = reinterpret_cast<A*>(v.data());

    p->x = 1;
    p->y = 'a';

    std::cout << p->x << " " << p->y << std::endl;
}

should I use std::launder?

EDIT1

I found an example on cppreference.com where we cast void* to an implicit-lifetime type with reinterpret_cast:

template<std::size_t N>
struct MyAllocator
{
    std::byte data[N];
    std::size_t sz{N};
    void* p{data};

    MyAllocator() = default;

    // Note: only well-defined for implicit-lifetime types
    template<typename T>
    T* implicit_aligned_alloc(std::size_t a = alignof(T))
    {
        if (std::align(a, sizeof(T), p, sz))
        {
            T* result = std::launder(reinterpret_cast<T*>(p));
            p = static_cast<std::byte*>(p) + sizeof(T);
            sz -= sizeof(T);
            return result;
        }
        return nullptr;
    }
};

It is not UB, right? Because the answer tells about the special case with providing the storage in arrays:

As a special case, objects can be created in arrays of unsigned char or std::byte(since C++17) (in which case it is said that the array provides storage for the object) if ...

From the answer:

Only unsigned char or std::byte array can provide storage


Solution

  • This question encompasses in fact several sensitive issues:

    implicit lifetime object creation

    A is an implicit lifetime class. The construction of v may trigger implicit creation of an A object because std::vector actual data initialization uses the default allocator, which builds an array with operator ::new, and that may be enough to trigger implicit lifetime object creation, as soon as the rest of the program is well-behaved if a A object is created there (see also intro.object#14).

    storage reuse

    But if such creation occurred, this may reuse storage (cppreference lifetime), thus ending the lifetime of the std::vector data member, leaving it in an invalid state. This will prevent implicit lifetime creation of A. Any access would then be UB as the object would not exist.

    Only unsigned char or std::byte array can provide storage (see also cppreference lifetime/providing storage, thus preventing this end of lifetime.

    Can std::uint8_t be considered as an unsigned char or std::byte in this matter? I would say no in general (see A byte type: std::byte vs std::uint8_t vs unsigned char vs char vs std::bitset<8> for instance).

    You would have to check std::is_same_v<std::uint8_t,unsigned char> and std::is_same_v<std::uint8_t,std::byte> to see if it is an alias which would then have the same properties (https://eel.is/c++draft/dcl.typedef#1 and https://eel.is/c++draft/dcl.typedef#2).

    std::launder usage

    What we should expect

    Yet regarding using reinterpret_cast only, you can form a pointer to another pointer type and dereference it only if it is type accessible.

    If you have a properly created a A object, then reinterpret_cast.html#Type_accessibility is achieved. You are forming a lvalue reference to a living object of the same type A.

    There is no need, in this case for std::launder.

    Note: I initially thought differently as cppreference is misleading on this point:

    Typical uses of std::launder include:
    ...
    Obtaining a pointer to an object created by placement new from a pointer to an object providing storage for that object.

    IMHO, this wording is confusing and is not in ptr.launder, whose example, I think, is clearer.

    But: open discussion

    There is an open discussion P3006 that claims that there is a bug in the standard (and proposes a simple way to fix it).

    Basically another part of the standard defines the property pointer-interconvertible that should have two objects, so that a pointer to one can be reinterpret_casted to a pointer to the other, and arrays of bytes have been omitted from this definition.

    In this case, it is unclear for me if std::launder can be used to solve the issue (see below). Yet, I would think so: https://eel.is/c++draft/ptr.launder#2:

    Thus std::launder preconditions are fulfilled.

    https://eel.is/c++draft/ptr.launder#3:

    Returns: A value of type T* that points to X.

    In this case, returns a pointer to A that is pointing to the implicitly created object, so it wont hurt to use std::launder (the overhead, if any, is assuredly negligible).

    Notice that the main compilers seem to do the right thing with or without std::launder.

    Note: a playground to see that std::launder is not a no-op, at least with -O0.

    Alignment

    Alignment may be an issue in this case: the default allocation provides a suitably aligned address for most types (I treated this point elsewhere) but nothing mandates the std::vector data to be placed at the beginning of the storage. The only constraint is that the address returned by data() is suitably aligned for the underlying type of the vector.

    Thus you must check the alignment programmatically. If it is not valid, then the program is not "well behaved" and the implicit object creation does not happen: accessing it leads to UB.

    Note: the alignment must be checked before casting, otherwise the obtained pointer value would be unspecified:

    So to sum it up and to remove UB (the access of a non-existing object) you need:

    #include <cstddef>  // std::size_t, std::byte
    #include <iostream>
    #include <memory>     // std::align
    #include <stdexcept>  // exception
    #include <vector>
    
    struct A {
        int x;
        char y;
    };
    
    int main() {
        // triggers implicit lifetime object creation (ILOC)
        std::vector<std::byte> v(sizeof(A), std::byte(0));
    
        // check for alignement
        void* ptest = v.data();
        std::size_t sz = v.size();
        if (nullptr == std::align(alignof(A), sizeof(A), ptest, sz)) {
            throw std::runtime_error("Badly aligned storage");
        }
    
        // gets a dereferencable A*
        // not UB if ILOC occurred
    
        // with P3006 accepted
        A* p = reinterpret_cast<A*>(v.data());
    
        // otherwise
        // A* p = std::launder(reinterpret_cast<A*>(v.data()));
    
        // program is ""well-behaved"" so far
        // p is the location of a living `A` object
        p->x = 1;
        p->y = 'a';
    
        std::cout << p->x << " " << p->y << '\n';
    }
    

    LIVE

    You could probably learn more (and so do I) with this video.

    Note: in this video (circa 39') it is claimed that std::launder is mandatory but, IMHO, it all depends of the status of P3006.

    I would also (strongly) recommend An (In-)Complete Guide to C++ Object Lifetimes

    Credits and special thanks (by order of comments)