函数式编程——不可变性代价高昂吗?

问题分为两部分。第一个是概念上的。下一节将在 Scala 中更具体地讨论同样的问题。

  1. 在编程语言中仅使用不可变的数据结构是否会使实现某些算法/逻辑在实践中本质上计算成本更高?这引出了一个事实,即不可变性是纯函数式语言的核心原则。还有其他影响因素吗?
  2. 让我们举一个更具体的例子。快速排序通常使用内存中数据结构上的可变操作来教授和实现。如何以 PURE 函数方式实现这样的东西,其计算和存储开销与可变版本相当。特别是在 Scala 中。我在下面列出了一些粗略的基准。

更多详情:

我来自一个命令式编程背景(C + + ,Java) ,我一直在探索函数式编程,特别是 Scala。

纯函数式编程的一些基本原则:

  1. 职能是一等公民。
  2. 函数没有副作用,因此对象/数据结构是 永恒不变

尽管现代的 JVM在创建对象方面非常高效,而且 垃圾回收对于短寿命对象来说非常便宜,但是最小化对象创建可能还是更好的,对吧?至少在不存在并发和锁定问题的单线程应用程序中是如此。由于 Scala 是一个混合范例,如果需要的话,可以选择使用可变对象编写命令式代码。但是,作为一个花了很多年试图重用对象和最小化分配的人。我希望能够很好地理解那个甚至不允许这样做的学派。

作为一个特例,我对 本教程 6中的这个代码片段感到有点惊讶。它有一个 Java 版本的 Quicksort,后面跟着一个看起来整洁的类似的 Scala 实现。

下面是我对实现进行基准测试的尝试。我还没做详细的侧写。但是,我猜测 Scala 版本会慢一些,因为分配的对象数量是线性的(每个递归调用一个)。尾部调用优化是否有可能发挥作用?如果我是对的,Scala 支持自递归调用的尾部调用优化。所以,它应该只是帮助它。我正在使用 Scala 2.8。

Java 版本

public class QuickSortJ {


public static void sort(int[] xs) {
sort(xs, 0, xs.length -1 );
}


static void sort(int[] xs, int l, int r) {
if (r >= l) return;
int pivot = xs[l];
int a = l; int b = r;
while (a <= b){
while (xs[a] <= pivot) a++;
while (xs[b] > pivot) b--;
if (a < b) swap(xs, a, b);
}
sort(xs, l, b);
sort(xs, a, r);
}


static void swap(int[] arr, int i, int j) {
int t = arr[i]; arr[i] = arr[j]; arr[j] = t;
}
}

Scala 版本

object QuickSortS {


def sort(xs: Array[Int]): Array[Int] =
if (xs.length <= 1) xs
else {
val pivot = xs(xs.length / 2)
Array.concat(
sort(xs filter (pivot >)),
xs filter (pivot ==),
sort(xs filter (pivot <)))
}
}

比较实现的 Scala 代码

import java.util.Date
import scala.testing.Benchmark


class BenchSort(sortfn: (Array[Int]) => Unit, name:String) extends Benchmark {


val ints = new Array[Int](100000);


override def prefix = name
override def setUp = {
val ran = new java.util.Random(5);
for (i <- 0 to ints.length - 1)
ints(i) = ran.nextInt();
}
override def run = sortfn(ints)
}


val benchImmut = new BenchSort( QuickSortS.sort , "Immutable/Functional/Scala" )
val benchMut   = new BenchSort( QuickSortJ.sort , "Mutable/Imperative/Java   " )


benchImmut.main( Array("5"))
benchMut.main( Array("5"))

结果

连续五次运行的时间(毫秒)

Immutable/Functional/Scala    467    178    184    187    183
Mutable/Imperative/Java        51     14     12     12     12
10609 次浏览

I don't think the Scala version is actually tail recursive, since you are using Array.concat.

Also, just because this is idiomatic Scala code, this doesn't mean it is the best way to do it.

The best way to do this would be to use one of Scala's built-in sorting functions. That way you get the immutability guarantee and know you have a speedy algorithm.

See Stack Overflow question How do I sort an array in Scala? for an example.

Sorting an array is, like, the most imperative task in the universe. It is not surprising that many elegant 'immutable' strategies/implementations fail poorly on a 'sort an array' microbenchmark. This does not imply that immutability is expensive "in general", though. There are many tasks where immutable implementations will perform comparably to mutable ones, but array sorting often is not one of them.

If you're simply rewriting your imperative algorithms and data structures into functional language it indeed will be expensive and useless. To make the things shine, you should use the features available only in functional programming: data stuctures persistence, lazy evaluations etc.

QuickSort is known to be faster when done in-place, so this is hardly a fair comparison!

Having said that... Array.concat? If nothing else, you're showing how a collection type optimised for imperative programming is particularly slow when you try and use it in a functional algorithm; almost any other choice would be faster!


Another very important point to consider, perhaps the most important issue when comparing the two approaches is: "how well does this scale out to multiple nodes/cores?"

Chances are, if you're looking for an immutable quicksort then you're doing so because you actually want a parallel quicksort. Wikipedia has some citations on this subject: http://en.wikipedia.org/wiki/Quicksort#Parallelizations

The scala version can simply fork before the function recurses, allowing it to very quickly sort a list containing billions of entries if you have enough cores available.

Right now, the GPU in my system has 128 cores available to me if I could just run the Scala code on it, and this is on a simple desktop system two years behind the current generation.

How would that stack up against the single-threaded imperative approach I wonder...

Perhaps the more important question is therefore:

"Given that individual cores aren't going to get any faster, and synchronisation/locking presents a real challenge for parallelisation, is mutability expensive?"

The cost of immutability in Scala

Here is a version that is nearly as fast than the Java one. ;)

object QuickSortS {
def sort(xs: Array[Int]): Array[Int] = {
val res = new Array[Int](xs.size)
xs.copyToArray(res)
(new QuickSortJ).sort(res)
res
}
}

This version makes a copy of the array, sorts it in place using the Java version and returns the copy. Scala does not force you to use immutable structure internally.

So the benefit of Scala is that you can leverage mutability and immutability as you see fit. The disadvantage is that if you do that wrong you don't really get the benefits of immutability.

There are a bunch of things wrong with this as a benchmark of functional programming. Highlights include:

  • You're using primitives, which may have to be boxed/unboxed. You're not trying to test the overhead of wrapping primitive objects, you're trying to test immutability.
  • You've chosen an algorithm where in-place operation is unusually effective (and provably so). If you want to show that there exist algorithms that are faster when implemented mutably, then this is a good choice; otherwise, this is likely to be misleading.
  • You are using the wrong timing function. Use System.nanoTime.
  • The benchmark is too short for you to be confident that JIT compilation won't be a significant part of the measured time.
  • The array isn't split in an efficient manner.
  • Arrays are mutable, so using them with FP is a strange comparison anyway.

So, this comparison is a great illustration that you must understand your language (and algorithm) in detail in order to write high-performance code. But it's not a very good comparison of FP vs. non-FP. If you want that, check out Haskell vs. C++ at the Computer Languages Benchmark Game. The take-home message there is that the penalty is typically not more than a factor of 2 or 3 or so, but it really depends. (No promises that the Haskell folks have written the fastest algorithms possible either, but at least some of them probably tried! Then again, some of the Haskell calls C libraries....)

Now, suppose you do want a more reasonable benchmark of Quicksort, recognizing that this is probably one of the worst cases for FP vs. mutable algorithms, and ignoring the data-structure issue (i.e. pretending that we can have an immutable Array):

object QSortExample {
// Imperative mutable quicksort
def swap(xs: Array[String])(a: Int, b: Int) {
val t = xs(a); xs(a) = xs(b); xs(b) = t
}
def muQSort(xs: Array[String])(l: Int = 0, r: Int = xs.length-1) {
val pivot = xs((l+r)/2)
var a = l
var b = r
while (a <= b) {
while (xs(a) < pivot) a += 1
while (xs(b) > pivot) b -= 1
if (a <= b) {
swap(xs)(a,b)
a += 1
b -= 1
}
}
if (l<b) muQSort(xs)(l, b)
if (b<r) muQSort(xs)(a, r)
}


// Functional quicksort
def fpSort(xs: Array[String]): Array[String] = {
if (xs.length <= 1) xs
else {
val pivot = xs(xs.length/2)
val (small,big) = xs.partition(_ < pivot)
if (small.length == 0) {
val (bigger,same) = big.partition(_ > pivot)
same ++ fpSort(bigger)
}
else fpSort(small) ++ fpSort(big)
}
}


// Utility function to repeat something n times
def repeat[A](n: Int, f: => A): A = {
if (n <= 1) f else { f; repeat(n-1,f) }
}


// This runs the benchmark
def bench(n: Int, xs: Array[String], silent: Boolean = false) {
// Utility to report how long something took
def ptime[A](f: => A) = {
val t0 = System.nanoTime
val ans = f
if (!silent) printf("elapsed: %.3f sec\n",(System.nanoTime-t0)*1e-9)
ans
}


if (!silent) print("Scala builtin ")
ptime { repeat(n, {
val ys = xs.clone
ys.sorted
}) }
if (!silent) print("Mutable ")
ptime { repeat(n, {
val ys = xs.clone
muQSort(ys)()
ys
}) }
if (!silent) print("Immutable ")
ptime { repeat(n, {
fpSort(xs)
}) }
}


def main(args: Array[String]) {
val letters = (1 to 500000).map(_ => scala.util.Random.nextPrintableChar)
val unsorted = letters.grouped(5).map(_.mkString).toList.toArray


repeat(3,bench(1,unsorted,silent=true))  // Warmup
repeat(3,bench(10,unsorted))     // Actual benchmark
}
}

Note the modification to the functional Quicksort so it only goes through the data once if at all possible, and the comparison to the built-in sort. When we run it we get something like:

Scala builtin elapsed: 0.349 sec
Mutable elapsed: 0.445 sec
Immutable elapsed: 1.373 sec
Scala builtin elapsed: 0.343 sec
Mutable elapsed: 0.441 sec
Immutable elapsed: 1.374 sec
Scala builtin elapsed: 0.343 sec
Mutable elapsed: 0.442 sec
Immutable elapsed: 1.383 sec

So, aside from learning that trying to write your own sort is a bad idea, we find that there is a ~3x penalty for an immutable quicksort if the latter is implemented somewhat carefully. (You could also write a trisect method that returns three arrays: those less than, those equal, and those greater than the pivot. This might speed things up slightly more.)

Since there are a few misconceptions flying around here, I’d like to clarify some points.

  • The “in-place” quicksort isn’t really in-place (and quicksort is not by definition in-place). It requires additional storage in the form of stack space for the recursive step, which is in the order of O(log n) in the best case, but O(n) in the worst case.

  • Implementing a functional variant of quicksort that operates on arrays defeats the purpose. Arrays are never immutable.

  • The “proper” functional implementation of quicksort uses immutable lists. It is of course not in-place but it’s got the same worst-case asymptotic runtime (O(n^2)) and space complexity (O(n)) as the procedural in-place version.

    On average, its running time is still on par with that of the in-place variant (O(n log n)). Its space complexity, however, is still O(n).


There are two obvious disadvantages of a functional quicksort implementation. In the following, let’s consider this reference implementation in Haskell (I don’t know Scala …) from the Haskell introduction:

qsort []     = []
qsort (x:xs) = qsort lesser ++ [x] ++ qsort greater
where lesser  = (filter (< x) xs)
greater = (filter (>= x) xs)
  1. The first disadvantage is the choice of the pivot element, which is very inflexible. The strength of modern quicksort implementations relies heavily on a smart choice of the pivot (compare “Engineering a sort function” by Bentley et al.). The above algorithm is poor in that regard, which degrades average performance considerably.

  2. Secondly, this algorithm uses list concatenation (instead of list construction) which is an O(n) operation. This doesn’t impact on the asymptotic complexity but it’s a measurable factor.

A third disadvantage is somewhat hidden: unlike the “in-place” variant, this implementation continually requests memory from the heap for the cons cells of the list and potentially scatters memory all over the place. As a result, this algorithm has a very poor cache locality. I don’t know whether smart allocators in modern functional programming languages can mitigate this – but on modern machines, cache misses have become a major performance killer.


What’s the conclusion? Unlike others, I wouldn’t say that quicksort is inherently imperative and that’s why it performs badly in an FP environment. Quite on the contrary, I would argue that quicksort is a perfect example of a functional algorithm: it translates seamlessly into an immutable environment, its asymptotic running time and space complexity are on par with the procedural implementation, and even its procedural implementation employs recursion.

But this algorithm still performs worse when constrained to an immutable domain. The reason for this is that the algorithm has the peculiar property of benefitting from a lot of (sometimes low-level) fine-tuning that can only be efficiently performed on arrays. A naive description of the quicksort misses all these intricacies (both in the functional and in the procedural variant).

After reading “Engineering a sort function” I can no longer consider quicksort an elegant algorithm. Implemented efficiently, it is a clunky mess, an engineer’s piece of work, not an artist’s (not to devalue engineering! this has its own aesthetic).


But I would also like to point out that this point is particular to quicksort. Not every algorithm is amenable to the same sort of low-level tweaking. A lot of algorithms and data structures really can be expressed without performance loss in an immutable environment.

And immutability can even decrease performance costs by removing the need of costly copies or cross-thread synchronizations.

So, to answer the original question, “is immutability expensive?” – In the particular case of quicksort, there is a cost that is indeed a result of immutability. But in general, no.

Immutability is not expensive. It sure can be expensive if you measure a small subset of the tasks a program have to do, and pick a solution based on mutability to boot -- such as measuring quicksort.

To put it simply, you don't quicksort when using purely functional languages.

Let's consider this from another angle. Let's consider these two functions:

// Version using mutable data structures
def tailFrom[T : ClassManifest](arr: Array[T], p: T => Boolean): Array[T] = {
def posIndex(i: Int): Int = {
if (i < arr.length) {
if (p(arr(i)))
i
else
posIndex(i + 1)
} else {
-1
}
}


var index = posIndex(0)


if (index < 0) Array.empty
else {
var result = new Array[T](arr.length - index)
Array.copy(arr, index, result, 0, arr.length - index)
result
}
}


// Immutable data structure:


def tailFrom[T](list: List[T], p: T => Boolean): List[T] = {
def recurse(sublist: List[T]): List[T] = {
if (sublist.isEmpty) sublist
else if (p(sublist.head)) sublist
else recurse(sublist.tail)
}
recurse(list)
}

Benchmark THAT, and you'll find that the code using mutable data structures has much worse performance, because it needs to copy the array, while the immutable code need not concern itself with that.

When you program with immutable data structures, you structure your code to take advantage of its strengths. It is not simply the data type, or even individual algorithms. The program will be designed in a different manner.

Which is why benchmarking is usually meaningless. Either you choose algorithms that are natural to one style or another, and that style wins, or you benchmark the whole application, which is often impractical.

It's been said that OO programming uses abstraction to hide complexity, and functional programming uses immutability to remove complexity. In the hybrid world of Scala we can use OO to hide the imperative code leaving application code none the wiser. Indeed the collections libraries use plenty of imperative code but that doesn't mean we shouldn't use them. As others have said, used with care, you really get the best of both worlds here.