为什么处理排序的数组 * 比处理未排序的数组 * 慢? (Java 的 ArrayList.indexOf)

标题参照 为什么处理排序的数组比处理未排序的数组更快?

这也是一个分支预测效果吗? 注意: 这里排序数组的处理是 慢一点! !

考虑以下代码:

private static final int LIST_LENGTH = 1000 * 1000;
private static final long SLOW_ITERATION_MILLIS = 1000L * 10L;


@Test
public void testBinarySearch() {
Random r = new Random(0);
List<Double> list = new ArrayList<>(LIST_LENGTH);
for (int i = 0; i < LIST_LENGTH; i++) {
list.add(r.nextDouble());
}
//Collections.sort(list);
// remove possible artifacts due to the sorting call
// and rebuild the list from scratch:
list = new ArrayList<>(list);


int nIterations = 0;
long startTime = System.currentTimeMillis();
do {
int index = r.nextInt(LIST_LENGTH);
assertEquals(index, list.indexOf(list.get(index)));
nIterations++;
} while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
long duration = System.currentTimeMillis() - startTime;
double slowFindsPerSec = (double) nIterations / duration * 1000;
System.out.println(slowFindsPerSec);


...
}

这在我的机器上打印出一个大约720的值。

现在,如果我激活集合排序调用,该值将下降到142。为什么? ! ?

结果 是结论性的,如果我增加迭代次数/时间,它们不会改变。

Java 版本是1.8.0 _ 71(Oracle VM,64位) ,运行在 Windows 10下,在 Eclipse Mars 中进行 JUnit 测试。

更新

似乎与连续内存访问有关(按顺序访问的双对象与按随机顺序访问的双对象)。对于我来说,当数组长度在10k 左右或更小时,效果开始消失。

感谢 assylias 提供 结果:

/**
* Benchmark                     Mode  Cnt  Score   Error  Units
* SO35018999.shuffled           avgt   10  8.895 ± 1.534  ms/op
* SO35018999.sorted             avgt   10  8.093 ± 3.093  ms/op
* SO35018999.sorted_contiguous  avgt   10  1.665 ± 0.397  ms/op
* SO35018999.unsorted           avgt   10  2.700 ± 0.302  ms/op
*/
4660 次浏览

I think we are seeing the effect of memory cache misses:

When you create the unsorted list

for (int i = 0; i < LIST_LENGTH; i++) {
list.add(r.nextDouble());
}

all the double are most likely allocated in a contiguous memory area. Iterating through this will produce few cache misses.

On the other hand in the sorted list the references point to memory in a chaotic manner.

Now if you create a sorted list with contiguous memory:

Collection.sort(list);
List<Double> list2 = new ArrayList<>();
for (int i = 0; i < LIST_LENGTH; i++) {
list2.add(new Double(list.get(i).doubleValue()));
}

this sorted list has the same performance than the original one (my timing).

It looks like caching / prefetching effect.

The clue is that you compare Doubles (objects), not doubles (primitives). When you allocate objects in one thread, they are typically allocated sequentially in memory. So when indexOf scans a list, it goes through sequential memory addresses. This is good for CPU cache prefetching heuristics.

But after you sort the list, you still have to do the same number of memory lookups in average, but this time memory access will be in random order.

UPDATE

Here is the benchmark to prove that the order of allocated objects matters.

Benchmark            (generator)  (length)  (postprocess)  Mode  Cnt  Score   Error  Units
ListIndexOf.indexOf       random   1000000           none  avgt   10  1,243 ± 0,031  ms/op
ListIndexOf.indexOf       random   1000000           sort  avgt   10  6,496 ± 0,456  ms/op
ListIndexOf.indexOf       random   1000000        shuffle  avgt   10  6,485 ± 0,412  ms/op
ListIndexOf.indexOf   sequential   1000000           none  avgt   10  1,249 ± 0,053  ms/op
ListIndexOf.indexOf   sequential   1000000           sort  avgt   10  1,247 ± 0,037  ms/op
ListIndexOf.indexOf   sequential   1000000        shuffle  avgt   10  6,579 ± 0,448  ms/op

As a simple example that confirms the answer by wero and the answer by apangin (+1!): The following does a simple comparison of both options:

  • Creating random numbers, and sorting them optionally
  • Creating sequential numbers, and shuffling them optionally

It is also not implemented as a JMH benchmark, but similar to the original code, with only slight modifications to observe the effect:

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;


public class SortedListTest
{
private static final long SLOW_ITERATION_MILLIS = 1000L * 3L;


public static void main(String[] args)
{
int size = 100000;
testBinarySearchOriginal(size, true);
testBinarySearchOriginal(size, false);
testBinarySearchShuffled(size, true);
testBinarySearchShuffled(size, false);
}


public static void testBinarySearchOriginal(int size, boolean sort)
{
Random r = new Random(0);
List<Double> list = new ArrayList<>(size);
for (int i = 0; i < size; i++)
{
list.add(r.nextDouble());
}
if (sort)
{
Collections.sort(list);
}
list = new ArrayList<>(list);


int count = 0;
int nIterations = 0;
long startTime = System.currentTimeMillis();
do
{
int index = r.nextInt(size);
if (index == list.indexOf(list.get(index)))
{
count++;
}
nIterations++;
}
while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
long duration = System.currentTimeMillis() - startTime;
double slowFindsPerSec = (double) nIterations / duration * 1000;


System.out.printf("Size %8d sort %5s iterations %10.3f count %10d\n",
size, sort, slowFindsPerSec, count);
}


public static void testBinarySearchShuffled(int size, boolean sort)
{
Random r = new Random(0);
List<Double> list = new ArrayList<>(size);
for (int i = 0; i < size; i++)
{
list.add((double) i / size);
}
if (!sort)
{
Collections.shuffle(list);
}
list = new ArrayList<>(list);


int count = 0;
int nIterations = 0;
long startTime = System.currentTimeMillis();
do
{
int index = r.nextInt(size);
if (index == list.indexOf(list.get(index)))
{
count++;
}
nIterations++;
}
while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
long duration = System.currentTimeMillis() - startTime;
double slowFindsPerSec = (double) nIterations / duration * 1000;


System.out.printf("Size %8d sort %5s iterations %10.3f count %10d\n",
size, sort, slowFindsPerSec, count);
}


}

The output on my machine is

Size   100000 sort  true iterations   8560,333 count      25681
Size   100000 sort false iterations  19358,667 count      58076
Size   100000 sort  true iterations  18554,000 count      55662
Size   100000 sort false iterations   8845,333 count      26536

nicely showing that the timings are exactly the opposites of another: If random numbers are sorted, then the sorted version is slower. If sequential numbers are shuffled, then the shuffled version is slower.