Java 类文件的创建是确定性的吗?

当使用 同一个 JDK(即相同的 javac可执行文件)时,生成的类文件是否总是相同的?取决于 操作系统或者 硬件会有什么不同吗?除了 JDK 版本,是否还有其他因素导致差异?是否有任何编译器选项来避免差异?这种差异仅仅是理论上的,还是 Oracle 的 javac实际上为相同的输入和编译器选项生成了不同的类文件?

更新1 我感兴趣的是 一代,即编译器输出,而不是一个类文件是否可以在不同的平台上是 快跑

更新2 “相同的 JDK”,我指的也是相同的 javac可执行文件。

更新3 Oracle 编译器中理论差异与实际差异的区分。

[编辑,添加解释问题]
“当同一个 javac 可执行文件在不同的平台上运行时,会产生不同的字节码,这是什么情况?”

8486 次浏览

There is no obligation for the compilers to produce the same bytecode on each platform. You should consult the different vendors' javac utility to have a specific answer.


I will show a practical example for this with file ordering.

Let's say that we have 2 jar files: my1.jar and My2.jar. They're put in the lib directory, side-by-side. The compiler reads them in alphabetical order (since this is lib), but the order is my1.jar, My2.jar when the file system is case insensitive , and My2.jar, my1.jar if it is case sensitive.

The my1.jar has a class A.class with a method

public class A {
public static void a(String s) {}
}

The My2.jar has the same A.class, but with different method signature (accepts Object):

public class A {
public static void a(Object o) {}
}

It is clear that if you have a call

String s = "x";
A.a(s);

it will compile a method call with different signature in different cases. So, depending on your filesystem case sensitiveness, you will get different class as a result.

Most probably, the answer is "yes", but to have precise answer, one does need to search for some keys or guid generation during compiling.

I can't remember the situation where this occurs. For example to have ID for serializing purposes it is hardcoded, i.e. generated by programmer or IDE.

P.S. Also JNI can matter.

P.P.S. I found that javac is itself written in java. This means that it is identical on different platforms. Hence it would not generate different code without a reason. So, it can do this only with native calls.

Short Answer - NO


Long Answer

They bytecode need not be the same for different platform. It's the JRE (Java Runtime Environment) which know how exactly to execute the bytecode.

If you go through the Java VM specification you'll come to know that this needs not to be true that the bytecode is same for different platforms.

Going through the class file format, it shows the structure of a class file as

ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}

Checking about the minor and major version

minor_version, major_version

The values of the minor_version and major_version items are the minor and major version numbers of this class file.Together, a major and a minor version number determine the version of the class file format. If a class file has major version number M and minor version number m, we denote the version of its class file format as M.m. Thus, class file format versions may be ordered lexicographically, for example, 1.5 < 2.0 < 2.1. A Java virtual machine implementation can support a class file format of version v if and only if v lies in some contiguous range Mi.0 v Mj.m. Only Sun can specify what range of versions a Java virtual machine implementation conforming to a certain release level of the Java platform may support.1

Reading more through the footnotes

1 The Java virtual machine implementation of Sun's JDK release 1.0.2 supports class file format versions 45.0 through 45.3 inclusive. Sun's JDK releases 1.1.X can support class file formats of versions in the range 45.0 through 45.65535 inclusive. Implementations of version 1.2 of the Java 2 platform can support class file formats of versions in the range 45.0 through 46.0 inclusive.

So, investigating all this shows that the class files generated on different platforms need not be identical.

I believe that, if you use the same JDK, the generated byte code will always be the same, without relation with the harware and OS used. The byte code production is done by the java compiler, that uses a deterministic algorithm to "transform" the source code into byte code. So, the output will always be the same. In these conditions, only a update on the source code will affect the output.

Java allows you write/compile code on one platform and run on different platform. AFAIK; this will be possible only when class file generated on different platform is same or technically same i.e. identical.

Edit

What i mean by technically same comment is that. They don't need to be exactly same if you compare byte by byte.

So as per specification .class file of a class on different platforms don't need to match byte-by-byte.

Let's put it this way:

I can easily produce an entirely conforming Java compiler that never produces the same .class file twice, given the same .java file.

I could do this by tweaking all kinds of bytecode construction or by simply adding superfluous attributes to my method (which is allowed).

Given that the specification does not require the compiler to produce byte-for-byte identical class files, I'd avoid depending such a result.

However, the few times that I've checked, compiling the same source file with the same compiler with the same switches (and the same libraries!) did result in the same .class files.

Update: I've recently stumbled over this interesting blog post about the implementation of ABC0 on String in Java 7. In this blog post, there are some relevant parts, that I'll quote here (emphasis mine):

In order to make the compiler's output predictable and repeatable, the maps and sets used in these data structures are LinkedHashMaps and LinkedHashSets rather than just HashMaps and HashSets. In terms of functional correctness of code generated during a given compile, using ABC4 and HashSet would be fine; the iteration order does not matter. However, we find it beneficial to have javac's output not vary based on implementation details of system classes .

This pretty clearly illustrates the issue: The compiler is not required to act in a deterministic manner, as long as it matches the spec. The compiler developers, however, realize that it's generally a good idea to try (provided it's not too expensive, probably).

Overall, I'd have to say there is no guarantee that the same source will produce the same bytecode when compiled by the same compiler but on a different platform.

I'd look into scenarios involving different languages (code-pages), for example Windows with Japanese language support. Think multi-byte characters; unless the compiler always assumes it needs to support all languages it might optimize for 8-bit ASCII.

There is a section on binary compatibility in the Java Language Specification.

Within the framework of Release-to-Release Binary Compatibility in SOM (Forman, Conner, Danforth, and Raper, Proceedings of OOPSLA '95), Java programming language binaries are binary compatible under all relevant transformations that the authors identify (with some caveats with respect to the addition of instance variables). Using their scheme, here is a list of some important binary compatible changes that the Java programming language supports:

•Reimplementing existing methods, constructors, and initializers to improve performance.

•Changing methods or constructors to return values on inputs for which they previously either threw exceptions that normally should not occur or failed by going into an infinite loop or causing a deadlock.

•Adding new fields, methods, or constructors to an existing class or interface.

•Deleting private fields, methods, or constructors of a class.

•When an entire package is updated, deleting default (package-only) access fields, methods, or constructors of classes and interfaces in the package.

•Reordering the fields, methods, or constructors in an existing type declaration.

•Moving a method upward in the class hierarchy.

•Reordering the list of direct superinterfaces of a class or interface.

•Inserting new class or interface types in the type hierarchy.

This chapter specifies minimum standards for binary compatibility guaranteed by all implementations. The Java programming language guarantees compatibility when binaries of classes and interfaces are mixed that are not known to be from compatible sources, but whose sources have been modified in the compatible ways described here. Note that we are discussing compatibility between releases of an application. A discussion of compatibility among releases of the Java SE platform is beyond the scope of this chapter.

Firstly, there's absolutely no such guarantee in the spec. A conforming compiler could stamp the time of compilation into the generated class file as an additional (custom) attribute, and the class file would still be correct. It would however produce a byte-level different file on every single build, and trivially so.

Secondly, even without such nasty tricks about, there's no reason to expect a compiler to do exactly the same thing twice in a row unless both its configuration and its input are identical in the two cases. The spec does describe the source filename as one of the standard attributes, and adding blank lines to the source file could well change the line number table.

Thirdly, I've never encountered any difference in build due to the host platform (other than that which was attributable to differences in what was on the classpath). The code which would vary based on platform (i.e., native code libraries) isn't part of the class file, and the actual generation of native code from the bytecode happens after the class is loaded.

Fourthly (and most importantly) it reeks of a bad process smell (like a code smell, but for how you act on the code) to want to know this. Version the source if possible, not the build, and if you do need to version the build, version at the whole-component level and not on individual class files. For preference, use a CI server (such as Jenkins) to manage the process of turning source into runnable code.

For the question:

"What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?"

The Cross-Compilation example shows how we can use the Javac option:-target version

This flag generates class files which are compatible with the Java version we specify while invoking this command. Hence the class files will differ depending on the attributes we supply during the compaliation using this option.

I would put it another way.

First, I think the question is not about being deterministic:

Of course it is deterministic: randomness is hard to achieve in computer science, and there is no reason a compiler would introduce it here for any reason.

Second, if you reformulate it by "how similar are bytecode files for a same sourcecode file ?", then No, you can't rely on the fact that they will be similar.

A good way of making sure of this, is by leaving the .class (or .pyc in my case) in your git stage. You'll realize that among different computers in your team, git notices changes between .pyc files, when no changes were brought to the .py file (and .pyc recompiled anyway).

At least that's what I observed. So put *.pyc and *.class in your .gitignore !

There are two questions.

Can there be a difference depending on the operating system or hardware?

This is a theoretical question, and the answer is clearly, yes, there can be. As others have said, the specification does not require the compiler to produce byte-for-byte identical class files.

Even if every compiler currently in existence produced the same byte code in all circumstances (different hardware, etc.), the answer tomorrow might be different. If you never plan to update javac or your operating system, you could test that version's behavior in your particular circumstances, but the results might be different if you go from, for example, Java 7 Update 11 to Java 7 Update 15.

What are the circumstances where the same javac executable, when run on a different platform, will produce different bytecode?

That's unknowable.

I don't know if configuration management is your reason for asking the question, but it's an understandable reason to care. Comparing byte codes is a legitimate IT control, but only to determine if the class files changed, not top determine if the source files did.