LL 和递归下降解析器的区别?

我最近一直在尝试自学解析器(用于语言/上下文无关语法)的工作原理,除了一件事之外,大部分似乎都是有意义的。我的注意力特别集中在 LL (k)语法上,这两个主要算法似乎是 LL 解析器 (使用堆栈/解析表)和 递归下降解析器(简单地使用递归)。

据我所知,递归下降算法适用于所有 LL (k)文法,甚至可能更多,而 LL 解析器适用于所有 LL (k)文法。不过,递归下降解析器分析器的实现显然比 LL 分析器简单得多(就像 LL 分析器比 LR 分析器简单一样)。

因此,我的问题是,当使用这两种算法中的任何一种时,可能会遇到哪些优点/问题?为什么会有人选择 LL 而不是递归下降,因为它工作在相同的语法集上,并且实现起来更加棘手?

30210 次浏览

LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be O(k^n) (where n is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, but there is always a space tradeoff. LL parsers are (to my knowledge) always linear time.

On the flip side, you are correct in your intuition that recursive-descent parsers can handle a greater class of grammars than LL. Recursive-descent can handle any grammar which is LL(*) (that is, unlimited lookahead) as well as a small set of ambiguous grammars. This is because recursive-descent is actually a directly-encoded implementation of PEGs, or Parser Expression Grammar(s). Specifically, the disjunctive operator (a | b) is not commutative, meaning that a | b does not equal b | a. A recursive-descent parser will try each alternative in order. So if a matches the input, it will succeed even if b would have matched the input. This allows classic "longest match" ambiguities like the dangling else problem to be handled simply by ordering disjunctions correctly.

With all of that said, it is possible to implement an LL(k) parser using recursive-descent so that it runs in linear time. This is done by essentially inlining the predict sets so that each parse routine determines the appropriate production for a given input in constant time. Unfortunately, such a technique eliminates an entire class of grammars from being handled. Once we get into predictive parsing, problems like dangling else are no longer solvable with such ease.

As for why LL would be chosen over recursive-descent, it's mainly a question of efficiency and maintainability. Recursive-descent parsers are markedly easier to implement, but they're usually harder to maintain since the grammar they represent does not exist in any declarative form. Most non-trivial parser use-cases employ a parser generator such as ANTLR or Bison. With such tools, it really doesn't matter if the algorithm is directly-encoded recursive-descent or table-driven LL(k).

As a matter of interest, it is also worth looking into recursive-ascent, which is a parsing algorithm directly encoded after the fashion of recursive-descent, but capable of handling any LALR grammar. I would also dig into parser combinators, which are a functional way of composing recursive-descent parsers together.