无堆栈协程与堆栈协程有什么不同?

背景:

我问这个问题是因为我目前有一个包含许多(数百到数千)线程的应用程序。这些线程大部分时间都处于空闲状态,等待将工作项放入队列中。当工作项可用时,然后通过调用一些任意复杂的现有代码对其进行处理。在某些操作系统配置中,应用程序遇到了控制最大用户进程数量的内核参数,因此我想尝试减少工作线程数量的方法。

我提出的解决方案是:

这似乎是一种基于协程的方法,我将每个工作线程替换为一个协程,这将有助于实现这一点。然后,我可以有一个由实际(内核)工作线程池支持的工作队列。当一个条目被放置到特定协同程序的队列中进行处理时,一个条目将被放置到线程池的队列中。然后它将恢复相应的协同程序,处理其排队的数据,然后再次挂起它,从而释放工作线程来执行其他工作。

实施细节:

在思考如何做到这一点时,我很难理解无堆栈协程和堆栈协程之间的功能差异。我有一些使用 加速,加速库的堆栈协程的经验。我发现从概念层面理解它相对容易: 对于每个协程,它维护一个 CPU 上下文和堆栈的副本,当你切换到一个协程时,它切换到保存的上下文(就像内核模式的调度程序一样)。

我不太清楚的是,无栈协程与此有何不同。在我的应用程序中,与上述工作项排队相关的开销量非常重要。我见过的大多数实现,比如 新的二氧化碳库,都表明无堆栈协程提供了更低开销的上下文切换。

因此,我想更清楚地理解无堆栈协程和堆栈协程之间的功能差异。具体来说,我认为这些问题:

  • 像这个 这样的引用表明,区别在于在堆栈式协程和无堆栈式协程中哪些地方可以产生/恢复。是这样吗?有没有一个简单的例子,我可以做一个堆栈协程,但不在堆栈的?

  • 使用自动存储变量(例如“栈上的”变量)是否有任何限制?

  • 对于从无栈协程调用哪些函数有什么限制吗?

  • 如果没有为无栈协同程序保存堆栈上下文,那么当协同程序运行时,自动存储变量将去哪里?

32062 次浏览

First, thank you for taking a look at CO2 :)

The Boost.Coroutine doc describes the advantage of stackful coroutine well:

stackfulness

In contrast to a stackless coroutine a stackful coroutine can be suspended from within a nested stackframe. Execution resumes at exactly the same point in the code where it was suspended before. With a stackless coroutine, only the top-level routine may be suspended. Any routine called by that top-level routine may not itself suspend. This prohibits providing suspend/resume operations in routines within a general-purpose library.

first-class continuation

A first-class continuation can be passed as an argument, returned by a function and stored in a data structure to be used later. In some implementations (for instance C# yield) the continuation can not be directly accessed or directly manipulated.

Without stackfulness and first-class semantics, some useful execution control flows cannot be supported (for instance cooperative multitasking or checkpointing).

What does that mean to you? for example, imagine you have a function that takes a visitor:

template<class Visitor>
void f(Visitor& v);

You want to transform it to iterator, with stackful coroutine, you can:

asymmetric_coroutine<T>::pull_type pull_from([](asymmetric_coroutine<T>::push_type& yield)
{
f(yield);
});

But with stackless coroutine, there's no way to do so:

generator<T> pull_from()
{
// yield can only be used here, cannot pass to f
f(???);
}

In general, stackful coroutine is more powerful than stackless coroutine. So why do we want stackless coroutine? short answer: efficiency.

Stackful coroutine typically needs to allocate a certain amount of memory to accomodate its runtime-stack (must be large enough), and the context-switch is more expensive compared to the stackless one, e.g. Boost.Coroutine takes 40 cycles while CO2 takes just 7 cycles in average on my machine, because the only thing that a stackless coroutine needs to restore is the program counter.

That said, with language support, probably stackful coroutine can also take the advantage of the compiler-computed max-size for the stack as long as there's no recursion in the coroutine, so the memory usage can also be improved.

Speaking of stackless coroutine, bear in mind that it doesn't mean that there's no runtime-stack at all, it only means that it uses the same runtime-stack as the host side, so you can call recursive functions as well, just that all the recursions will happen on the host's runtime-stack. In contrast, with stackful coroutine, when you call recursive functions, the recursions will happen on the coroutine's own stack.

To answer the questions:

  • Are there any limitations on the use of automatic storage variables (i.e. variables "on the stack")?

No. It's the emulation limitation of CO2. With language support, the automatic storage variables visible to the coroutine will be placed on the coroutine's internal storage. Note my emphasis on "visible to the coroutine", if the coroutine calls a function that uses automatic storage variables internally, then those variables will be placed on the runtime-stack. More specifically, stackless coroutine only has to preserve the variables/temporaries that can be used after resumed.

To be clear, you can use automatic storage variables in CO2's coroutine body as well:

auto f() CO2_RET(co2::task<>, ())
{
int a = 1; // not ok
CO2_AWAIT(co2::suspend_always{});
{
int b = 2; // ok
doSomething(b);
}
CO2_AWAIT(co2::suspend_always{});
int c = 3; // ok
doSomething(c);
} CO2_END

As long as the definition does not precede any await.

  • Are there any limitations on what functions I can call from a stackless coroutine?

No.

  • If there is no saving of stack context for a stackless coroutine, where do automatic storage variables go when the coroutine is running?

Answered above, a stackless coroutine doesn't care about the automatic storage variables used in the called functions, they'll just be placed on the normal runtime-stack.

If you have any doubt, just check CO2's source code, it may help you understand the mechanics under the hood ;)

What you want are user-land threads/fibers - usually you want to suspend the your code (running in fiber) in a deep nested call stack (for instance parsing messages from TCP-connection). In this case you can not use stackless context switching (application stack is shared between stackless coroutines -> stack frames of called subroutines would be overwritten).

You can use something like boost.fiber which implements user-land threads/fibers based on boost.context.