是什么让 Java 比 C 更容易解析？

小开

最佳答案

Parsing C++ is getting hard. Parsing Java is getting to be just as hard.

See this SO answer discussing why C (and C++) is "hard" to parse. The short summary is that C and C++ grammars are inherently ambiguous; they will give you multiple parses and you must use context to resolve the ambiguities. People then make the mistake of assuming you have to resolve ambiguities as you parse; not so, see below. If you insist on resolving ambiguities as you parse, your parser gets more complicated and that much harder to build; but that complexity is a self-inflicted wound.

IIRC, Java 1.4's "obvious" LALR(1) grammar was not ambiguous, so it was "easy" to parse. I'm not so sure that modern Java hasn't got at least long distance local ambiguities; there's always the problem of deciding whether "...>>" closes off two templates or is a "right shift operator". I suspect modern Java does not parse with LALR(1) anymore.

But one can get past the parsing problem by using strong parsers (or weak parsers and context collection hacks as C and C++ front ends mostly do now), for both languages. C and C++ have the additional complication of having a preprocessor; these are more complicated in practice than they look. One claim is that the C and C++ parsers are so hard they have to be be written by hand. It isn't true; you can build Java and C++ parsers just fine with GLR parser generators.

But parsing isn't really where the problem is.

Once you parse, you will want to do something with the AST/parse tree. In practice, you need to know, for every identifier, what its definition is and where it is used ("name and type resolution", sloppily, building symbol tables). This turns out to be a LOT more work than getting the parser right, compounded by inheritance, interfaces, overloading and templates, and the confounded by the fact that the semantics for all this is written in informal natural language spread across tens to hundreds of pages of the language standard. C++ is really bad here. Java 7 and 8 are getting to be pretty awful from this point of view. (And symbol tables aren't all you need; see my bio for a longer essay on "Life After Parsing").

Most folks struggle with the pure parsing part (often never finishing; check SO itself for the many, many questions about to how to build working parsers for real langauges), so they don't ever see life after parsing. And then we get folk theorems about what is hard to parse and no signal about what happens after that stage.

Fixing C++ syntax won't get you anywhere.

Regarding changing the C++ syntax: you'll find you need to patch a lot of places to take care of the variety of local and real ambiguities in any C++ grammar. If you insist, the following list might be a good starting place. I contend there is no point in doing this if you are not the C++ standards committee; if you did so, and built a compiler using that, nobody sane would use it. There's too much invested in existing C++ applications to switch for convenience of the guys building parsers; besides, their pain is over and existing parsers work fine.

You may want to write your own parser. OK, that's fine; just don't expect the rest of the community to let you change the language they must use to make it easier for you. They all want it easier for them, and that's to use the language as documented and implemented.