声明包含64个元素的多个数组比声明包含65个元素的数组快1000倍

最近我注意到,声明一个包含64个元素的数组比声明一个包含65个元素的数组要快得多(> 1000倍)。

下面是我用来测试这个的代码:

public class Tests{
public static void main(String args[]){
double start = System.nanoTime();
int job = 100000000;//100 million
for(int i = 0; i < job; i++){
double[] test = new double[64];
}
double end = System.nanoTime();
System.out.println("Total runtime = " + (end-start)/1000000 + " ms");
}
}

这运行在大约6毫秒,如果我取代 new double[64]new double[65]需要大约7秒。如果作业分布在越来越多的线程上,这个问题就会成倍地严重,这也是我的问题的根源所在。

不同类型的数组(如 int[65]String[65])也会出现这个问题。 这个问题不会出现在大字符串中: String test = "many characters";,但是当这个字符串改为 String test = i + "";时就会出现

我想知道为什么会出现这种情况,是否有可能绕过这个问题。

2564 次浏览

You are observing a behavior that is caused by the optimizations done by the JIT compiler of your Java VM. This behavior is reproducible triggered with scalar arrays up to 64 elements, and is not triggered with arrays larger than 64.

Before going into details, let's take a closer look at the body of the loop:

double[] test = new double[64];

The body has no effect (observable behavior). That means it makes no difference outside of the program execution whether this statement is executed or not. The same is true for the whole loop. So it might happen, that the code optimizer translates the loop to something (or nothing) with the same functional and different timing behavior.

For benchmarks you should at least adhere to the following two guidelines. If you had done so, the difference would have been significantly smaller.

  • Warm-up the JIT compiler (and optimizer) by executing the benchmark several times.
  • Use the result of every expression and print it at the end of the benchmark.

Now let's go into details. Not surprisingly there is an optimization that is triggered for scalar arrays not larger than 64 elements. The optimization is part of the Escape analysis. It puts small objects and small arrays onto the stack instead of allocating them on the heap - or even better optimize them away entirely. You can find some information about it in the following article by Brian Goetz written in 2005:

The optimization can be disabled with the command line option -XX:-DoEscapeAnalysis. The magic value 64 for scalar arrays can also be changed on the command line. If you execute your program as follows, there will be no difference between arrays with 64 and 65 elements:

java -XX:EliminateAllocationArraySizeLimit=65 Tests

Having said that, I strongly discourage using such command line options. I doubt that it makes a huge difference in a realistic application. I would only use it, if I would be absolutely convinced of the necessity - and not based on the results of some pseudo benchmarks.

There are any number of ways that there can be a difference, based on the size of an object.

As nosid stated, the JITC may be (most likely is) allocating small "local" objects on the stack, and the size cutoff for "small" arrays may be at 64 elements.

Allocating on the stack is significantly faster than allocating in heap, and, more to the point, stack does not need to be garbage collected, so GC overhead is greatly reduced. (And for this test case GC overhead is likely 80-90% of the total execution time.)

Further, once the value is stack-allocated the JITC can perform "dead code elimination", determine that the result of the new is never used anywhere, and, after assuring there are no side-effects that would be lost, eliminate the entire new operation, and then the (now empty) loop itself.

Even if the JITC does not do stack allocation, it's entirely possible for objects smaller than a certain size to be allocated in a heap differently (eg, from a different "space") than larger objects. (Normally this would not produce quite so dramatic timing differences, though.)