为什么 Collections.sort 使用 Mergesort 而 Arrays.sort 不使用?

我使用的是 JDK-8(x64):

这个排序算法是弗拉基米尔 · 雅罗斯拉夫斯基、乔恩 · 本特利和约书亚 · 布洛赫设计的双轴 快速排序

对于 Collections.sort(对象) ,我发现了这个“ Timsort”:

这个实现是一个稳定的、自适应的、迭代的 合并排序... ... 这个实现是 将指定的列表转储到数组中,对数组进行排序,它在列表中迭代,从数组中相应的位置重新设置每个元素。

如果 Collections.sort使用数组,为什么它不调用 Arrays.sort或者使用双轴 快速排序? 为什么使用 合并排序

63718 次浏览

I don't know about the documentation, but the implementation of java.util.Collections#sort in Java 8 (HotSpot) goes like this:

@SuppressWarnings({"unchecked", "rawtypes"})
public static <T> void sort(List<T> list, Comparator<? super T> c) {
list.sort(c);
}

And List#sort has this implementation:

@SuppressWarnings({"unchecked", "rawtypes"})
default void sort(Comparator<? super E> c) {
Object[] a = this.toArray();
Arrays.sort(a, (Comparator) c);
ListIterator<E> i = this.listIterator();
for (Object e : a) {
i.next();
i.set((E) e);
}
}

So, in the end, Collections#sort uses Arrays#sort (of object elements) behind the scenes. This implementation uses merge sort or tim sort.

According to the Javadoc, only primitive arrays are sorted using Quicksort. Object arrays are sorted with a Mergesort as well.

So Collections.sort seems to use the same sorting algorithm as Arrays.sort for Objects.

Another question would be why a different sort algorithm is used for primitive arrays than for Object arrays?

The API guarantees a stable sorting which Quicksort doesn’t offer. However, when sorting primitive values by their natural order you won’t notice a difference as primitive values have no identity. Therefore, Quicksort can used for primitive arrays and will be used when it is considered more efficient¹.

For objects you may notice, when objects with different identity which are deemed equal according to their equals implementation or the provided Comparator change their order. Therefore, Quicksort is not an option. So a variant of MergeSort is used, the current Java versions use TimSort. This applies to both, Arrays.sort and Collections.sort, though with Java 8, the List itself may override the sort algorithms.


¹ The efficiency advantage of Quicksort is needing less memory when done in-place. But it has a dramatic worst case performance and can’t exploit runs of pre-sorted data in an array, which TimSort does.

Therefore, the sorting algorithms were reworked from version to version, while staying in the now-misleadingly named class DualPivotQuicksort. Also, the documentation didn’t catch up, which shows, that it is a bad idea in general, to name an internally used algorithm in a specification, when not necessary.

The current situation (including Java 8 to Java 11) is as follows:

  • Generally, the sorting methods for primitive arrays will use Quicksort only under certain circumstances. For larger arrays, they will try to identify runs of pre-sorted data first, like TimSort does, and will merge them when the number of runs does not exceed a certain threshold. Otherwise they will fall back to Quicksort, but with an implementation that will fall back to Insertion sort for small ranges, which does not only affect small arrays, but also quick sort’s recursion.
  • sort(char[],…) and sort(short[],…) add another special case, to use Counting sort for arrays whose length exceeds a certain threshold
  • Likewise, sort(byte[],…) will use Counting sort, but with a much smaller threshold, which creates the biggest contrast to the documentation, as sort(byte[],…) never uses Quicksort. It only uses Insertion sort for small arrays and Counting sort otherwise.

As stated across many of the answers.

The Quicksort is used by Arrays.sort for sorting primitive collections because stability isn't required (you won't know or care if two identical ints were swapped in the sort)

MergeSort or more specifically Timsort is used by Arrays.sort for sorting collections of objects. Stability is required. Quicksort does not provide for stability, Timsort does.

Collections.sort delegates to Arrays.sort which is why you see the javadoc referencing the MergeSort.

Quick Sort has two major drawbacks when it comes to merge sort:

  • It's not stable while it comes to non primitive.
  • It doesn't guarantee n log n performance.

Stability is a non-issue for primitive types, as there is no notion of identity as distinct from (value) equality.

Stability is a big deal when sorting arbitrary objects. It's a nice side benefit that Merge Sort guarantees n log n (time) performance no matter what the input. That's why merge sort is selected to provide a stable sort (Merge Sort) to sort object references.