parsingll-grammarearley-parser

Is there a simple example how Earley parser outperforms (or allows less verbose grammar than) LL(k) parser?


I've read that Earley is easier to use and that it can handle more cases than LL(k) (see https://www.wikiwand.com/en/Earley_parser):

Earley parsers are appealing because they can parse all context-free languages, unlike LR parsers and LL parsers, which are more typically used in compilers but which can only handle restricted classes of languages.

But I cannot find a simple example that shows Earley has an advantage over LL(k).


Solution

  • That quote (and the Wikipedia entry it comes from) do not claim that the Earley parsing algorithm is faster than LL(k), nor that it allows for shorter grammars. What it claims is that the Earley parser can parse grammars which cannot be parsed by LL(k) or LR(k) grammars. That is true; the Earley parser can parse any context-free-grammar.

    A simple example of a grammar which cannot be parsed by an LR(k) parser is the language of palindromes (sentences which read the same left-to-right as right-left). Here's a grammar for even-length palindromes over the alphabet {a, b}:

    S → ε
    S → a S a
    S → b S b
    

    You can add odd-length palindromes by adding two more productions (S → a and S → b), and it should be easy to see how to extend it to a larger alphabet.

    Note that the grammar is unambiguous; there is only one parse tree for every valid input. That's not an issue for Earley parsing -- the parser can produce a representation of all possible parses from an ambiguous grammar, although it might take longer than parsing an unambiguous grammar. However, LR(k) parsers only exist for unambiguous grammars (and, as shown by the above example, not for all unambiguous grammars).

    In the above, I only mention LR(k) parsing, because LR(k) parsing is strictly more powerful than LL(k) parsing. Any grammar with an LL(k) parser can be parsed with an LR(k) parser, so if there is no LR(k) parser, there is also no LL(k) parser. However, the converse is not true: there are grammars which can be parsed by an LR(k) parser for which no LL(k') grammar exists, for any value of k'. Moreover, there are languages which have an LR(1) grammar and which have no LL(k) grammar, for any value of k. The proofs of these assertions can be found in any good textbook on automaton theory.

    Any LL(k) language can be parsed in time linear to the length of the input using LL(k), LR(k) or Earley parsers. That is, the three algorithms have asymptotically equal computational complexity for grammars which qualify for all three algorithms. But asymptotic complexity isn't the full story. If you have an LR(1) grammar, it is probably faster (by a constant factor) to use an LR(1) parser, because the individual steps are just lookups in precomputed tables.

    If the grammar is also LL(1), then a well-written recursive descent or table-driven LL(1) is also probably faster. (Comparisons between LR(1) and LL(1) parsers are not as clear-cut; a lot will depend on the quality of the parser code.)

    But for values of k larger than 1, the Earley algorithm could well be the best choice, because of the size of LR(k) decision tables for larger values of k.