[SOLVED] boost::coroutine2 vs CoroutineTS

boost::coroutine2 vs CoroutineTS

Boost::Coroutine2 and CoroutineTS(C++20) are popular coroutine implementations in C++. Both do suspend and resume but two implementations follow a quite different approaches.

CoroutineTS(C++20)

Stackless
Suspend by return
Uses special keywords

generator<int> Generate()
{
   co_yield;
});

boost::coroutine2

Stackful
Suspend by call
Do not use special keywords

pull_type source([](push_type& sink)
{
   sink();
});

Are there any specific use cases where I should select only one of them?

Solution

The main technical distinction is whether you want to be able to yield from within a nested call. This cannot be done using stackless coroutines.

Another thing to consider is that stackful coroutines have a stack and context (such as signal masks, the stack pointer, the CPU registers, etc.) of their own, so they have a larger memory footprint than stackless coroutines. This can be an issue especially if you have a resource constrained system or massive amounts of coroutines existing simultaneously.

I have no idea how they compare performance-wise in the real world, but in general, stackless coroutines are more efficient, as they have less overhead (stackless task switches do not have to swap stacks, store/load registers, and restore the signal mask, etc.).

For an example of a minimal stackless coroutine implementation, see Simon Tatham's coroutines using Duff's Device. It is pretty intuitive that they are as efficient as you can get.

Also, this question has nice answers that go more into details about the differences between stackful and stackless coroutines.

How to yield from a nested call in stackless coroutines? Even though I said it's not possible, that was not 100% true: You can use (at least two) tricks to achieve this, each with some drawbacks: First, you have to convert every call that should be able to yield your calling coroutine into a coroutine as well. Now, there are two ways:

The trampoline approach: You simply call the child coroutine from the parent coroutine in a loop, until it returns. Every time you notify the child coroutine, if it does not finish, you also yield the calling coroutine. Note that this approach forbids calling the child coroutine directly, you always have to call the outermost coroutine, which then has to re-enter the whole callstack. This has a call and return complexity of O(n) for nesting depth n. If you are waiting for an event, the event simply has to notify the outermost coroutine.
The parent link approach: You pass the parent coroutine address to the child coroutine, yield the parent coroutine, and the child coroutine manually resumes the parent coroutine once it finishes. Note that this approach forbids calling any coroutine besides the inner-most coroutine directly. This approach has a call and return complexity of O(1), so it is generally preferable. The drawback is that you have to manually register the innermost coroutine somewhere, so that the next event that wants to resume the outer coroutine knows which inner coroutine to directly target.

Note: By call and return complexity I mean the number of steps taken when notifying a coroutine to resume it, and the steps taken after notifying it to return to the calling notifier again.