Scala 参与者: 接受与反应

首先,我有很多 Java 经验,但是最近才对函数式语言感兴趣。最近我开始研究 Scala,它似乎是一种非常好的语言。

然而,我一直在阅读关于 用 Scala 编程中 Scala 的 Actor 框架,有一件事我不明白。在第30.4章中,它说使用 react而不是 receive可以重用线程,这对性能有好处,因为线程在 JVM 中是昂贵的。

这是否意味着,只要我记得调用 react而不是 receive,我就可以随心所欲地启动多个 Actor?在发现 Scala 之前,我一直在使用 Erlang,而且 编程 Erlang的作者吹嘘自己不费吹灰之力就产生了超过200,000个进程。我讨厌对 Java 线程这么做。与 Erlang (和 Java)相比,我在 Scala 中看到了什么样的限制?

此外,这个线程在 Scala 中是如何重用的?为了简单起见,假设我只有一个线程。我开始运行的所有参与者会在这个线程中顺序运行,还是会发生某种任务切换?例如,如果我启动两个参与者,乒乓消息互相发送,如果它们在同一个线程中启动,我会冒死锁的风险吗?

根据 Scala 编程,编写演员使用 react比使用 receive更困难。这听起来似乎有道理,因为 react不会返回。然而,本书继续展示了如何使用 Actor.loopreact放入一个循环中。结果,你得到了

loop {
react {
...
}
}

在我看来

while (true) {
receive {
...
}
}

这在书的前面已经用过了。尽管如此,这本书说“在实践中,程序将至少需要几个 receive”。我错过了什么?除了返回,receive还能做什么 react不能做的?关我什么事?

最后,来到我不理解的核心: 这本书一直在提到如何使用 react使得丢弃调用堆栈来重用线程成为可能。这是怎么回事?为什么必须丢弃调用堆栈?为什么当函数通过抛出异常(react)结束时可以丢弃调用堆栈,而当函数通过返回(receive)结束时不能丢弃调用堆栈?

我的印象是,Scala 编程一直在掩盖这里的一些关键问题,这是一个耻辱,因为否则它是一本真正优秀的书。

18144 次浏览

First, each actor waiting on receive is occupying a thread. If it never receives anything, that thread will never do anything. An actor on react does not occupy any thread until it receives something. Once it receives something, a thread gets allocated to it, and it is initialized in it.

Now, the initialization part is important. A receiving thread is expected to return something, a reacting thread is not. So the previous stack state at the end of the last react can be, and is, wholly discarded. Not needing to either save or restore the stack state makes the thread faster to start.

There are various performance reasons why you might want one or other. As you know, having too many threads in Java is not a good idea. On the other hand, because you have to attach an actor to a thread before it can react, it is faster to receive a message than react to it. So if you have actors that receive many messages but do very little with it, the additional delay of react might make it too slow for your purposes.

The answer is "yes" - if your actors are not blocking on anything in your code and you are using react, then you can run your "concurrent" program within a single thread (try setting the system property actors.maxPoolSize to find out).

One of the more obvious reasons why it is necessary to discard the call stack is that otherwise the loop method would end in a StackOverflowError. As it is, the framework rather cleverly ends a react by throwing a SuspendActorException, which is caught by the looping code which then runs the react again via the andThen method.

Have a look at the mkBody method in Actor and then the seq method to see how the loop reschedules itself - terribly clever stuff!

Those statements of "discarding the stack" confused me also for a while and I think I get it now and this is my understanding now. In case of "receive" there is a dedicated thread blocking on the message (using object.wait() on a monitor) and this means that the complete thread stack is available and ready to continue from the point of "waiting" on receiving a message. For example if you had the following code

  def a = 10;
while (! done)  {
receive {
case msg =>  println("MESSAGE RECEIVED: " + msg)
}
println("after receive and printing a " + a)
}

the thread would wait in the receive call until the message is received and then would continue on and print the "after receive and printing a 10" message and with the value of "10" which is in the stack frame before the thread blocked.

In case of react there is no such dedicated thread, the whole method body of the react method is captured as a closure and is executed by some arbitrary thread on the corresponding actor receiving a message. This means only those statements that can be captured as a closure alone will be executed and that's where the return type of "Nothing" comes to play. Consider the following code

  def a = 10;
while (! done)  {
react {
case msg =>  println("MESSAGE RECEIVED: " + msg)
}
println("after react and printing a " + a)
}

If react had a return type of void, it would mean that it is legal to have statements after the "react" call ( in the example the println statement that prints the message "after react and printing a 10"), but in reality that would never get executed as only the body of the "react" method is captured and sequenced for execution later (on the arrival of a message). Since the contract of react has the return type of "Nothing" there cannot be any statements following react, and there for there is no reason to maintain the stack. In the example above variable "a" would not have to be maintained as the statements after the react calls are not executed at all. Note that all the needed variables by the body of react is already be captured as a closure, so it can execute just fine.

The java actor framework Kilim actually does the stack maintenance by saving the stack which gets unrolled on the react getting a message.

Just to have it here:

Event-Based Programming without Inversion of Control

These papers are linked from the scala api for Actor and provide the theoretical framework for the actor implementation. This includes why react may never return.

I haven't done any major work with scala /akka, however i understand that there is a very significant difference in the way actors are scheduled. Akka is just a smart threadpool which is time slicing execution of actors... Every time slice will be one message execution to completion by an actor unlike in Erlang which could be per instruction?!

This leads me to think that react is better as it hints the current thread to consider other actors for scheduling where as receive "might" engage the current thread to continue executing other messages for the same actor.