This is a description of what I am trying to do with the code, skip to the next section to see the actual issue.
I want to use coroutines in an embedded system, where I can't afford too many dynamic allocations. Therefore, I am trying the following: I have non-copyable, non-movable awaitable types for the various queries to peripherals. When querying a peripheral, I use something like auto result = co_await Awaitable{params}
. The constructor of the awaitable prepares the request to the peripheral, registers its internal buffer
to receive the reply, and registers its ready
flag in the promise. The coroutine is then suspended.
Later, the buffer
will be filled, and the ready
flag will be set to true
. After this, the coroutine knows that it can be resumed, which the causes the awaitable to copy out the result from the buffer before being destroyed.
The awaitable is non-copyable and non-movable to force guaranteed copy elision everywhere, so that I can be sure that the pointers to buffer
and ready
remain valid until the awaitable has been awaited (at least that was the plan...)
I am encountering an issue with ARM GCC 11.3 in the following code:
#include <cstring>
#include <coroutine>
struct AwaitableBase {
AwaitableBase() = default;
AwaitableBase(const AwaitableBase&) = delete;
AwaitableBase(AwaitableBase&&) = delete;
AwaitableBase& operator=(const AwaitableBase&) = delete;
AwaitableBase& operator=(AwaitableBase&&) = delete;
char buffer[65];
};
struct task {
struct promise_type
{
bool* ready_ptr;
task get_return_object() { return {}; }
std::suspend_never initial_suspend() noexcept { return {}; }
std::suspend_always final_suspend() noexcept { return {}; }
void return_void() {}
void unhandled_exception() {}
};
};
struct Awaitable{
AwaitableBase base;
bool ready{false};
bool await_ready() {return false;}
void await_suspend(std::coroutine_handle<task::promise_type> handle)
{
handle.promise().ready_ptr = &ready;
}
int await_resume() { return 2; }
};
AwaitableBase make_awaitable_base()
{
return AwaitableBase{};
}
task example()
{
co_await Awaitable{make_awaitable_base()};
}
When compiling this with ARM GCC 11.3 without any optimizations, the code contains a memcpy
call that moves around the AwaitableBase
object (excerpt from Godbolt):
ldr r3, [r7, #4]
adds r3, r3, #87
mov r0, r3
bl make_awaitable_base()
ldr r2, [r7, #4]
ldr r3, [r7, #4]
add r0, r2, #21
adds r3, r3, #87
movs r2, #65
mov r1, r3
bl memcpy
ldr r3, [r7, #4]
movs r2, #0
strb r2, [r3, #86]
ldr r3, [r7, #4]
adds r3, r3, #21
mov r0, r3
bl Awaitable::await_ready()
This breaks my code, as I am relying the fact that the object cannot be moved/copied. It was my understanding that making an object non-copyable & non-movable should prevent it from being memcopied.
memcpy
is no longer present in 13.1 - unfortunately, I am stuck with 11.3memcpy
is not present if I remove the aggreate initialization of Awaitable
wrapped around AwaitableBase
(and instead make AwaitableBase
itself the awaitable) - this doesn't work for me because I'd like to wrap other awaitables with Awaitable
to modify their behaviormemcpy
is not present without the co_await
ready_ptr
stored in the promise to check if the awaitable is done.How can I work around this?
Is it a bug with the compiler, or am I misunderstanding something about guaranteed copy elision? Is it undefined behavior to rely on the fact that the address of the temporary should not change during the duration of the co_await
call?
As pointed out in the comments, this is a GCC bug, where prvalues created by constructing objects in co_await
expressions are erroneously treated as trivially copyable aggregates, creating a temporary that is memcpy
'd from.
The fix is to never construct a non-trivial object directly in a co_await
expression. E.g., co_await Class{ ... }
, co_await function_call(Class{ ... })
and co_await Class{ ... }.member_function()
are all prone to this bug.
You can replace these with co_await [&]{ return ...; }();
(which is co_await lambda_type(captured_references...)()
, where that lambda type can be memcpy copied)
You might want to macro-ify this to #define CO_AWAIT(...) co_await [&]() -> decltype(auto) { return __VA_ARGS__ ; }()
so you can just search for lowercase co_await
in your code base to completely eliminate this bug.