algorithmparsingcompiler-constructiongrammarlr-grammar# What is the difference between LR, SLR, and LALR parsers?

What is the actual difference between LR, SLR, and LALR parsers? I know that SLR and LALR are types of LR parsers, but what is the actual difference as far as their parsing tables are concerned?

And how to show whether a grammar is LR, SLR, or LALR? For an LL grammar we just have to show that any cell of the parsing table should not contain multiple production rules. Any similar rules for LALR, SLR, and LR?

For example, how can we show that the grammar

```
S --> Aa | bAc | dc | bda
A --> d
```

is LALR(1) but not SLR(1)?

**EDIT (ybungalobill)**: I didn't get a satisfactory answer for what's the difference between LALR and LR. So LALR's tables are smaller in size but it can recognize only a subset of LR grammars. Can someone elaborate more on the difference between LALR and LR please? LALR(1) and LR(1) will be sufficient for an answer. Both of them use 1 token look-ahead and *both* are table driven! How they are different?

Solution

SLR, LALR and LR parsers can all be implemented using exactly the same table-driven machinery.

Fundamentally, the parsing algorithm collects the next input token T, and consults the current state S (and associated lookahead, GOTO, and reduction tables) to decide what to do:

- SHIFT: If the current table says to SHIFT on the token T, the pair (S,T) is pushed onto the parse stack, the state is changed according to what the GOTO table says for the current token (e.g, GOTO(T)), another input token T' is fetched, and the process repeats
- REDUCE: Every state has 0, 1, or many possible reductions that might occur in the state. If the parser is LR or LALR, the token is checked against lookahead sets for all valid reductions for the state. If the token matches a lookahead set for a reduction for grammar rule G = R1 R2 .. Rn, a stack reduction and shift occurs: the semantic action for G is called, the stack is popped n (from Rn) times, the pair (S,G) is pushed onto the stack, the new state S' is set to GOTO(G), and the cycle repeats with the same token T. If the parser is an SLR parser, there is at most one reduction rule for the state and so the reduction action can be done blindly without searching to see which reduction applies. It is useful for an SLR parser to know if there
*is*a reduction or not; this is easy to tell if each state explicitly records the number of reductions associated with it, and that count is needed for the L(AL)R versions in practice anyway. - ERROR: If neither SHIFT nor REDUCE is possible, a syntax error is declared.

So, if they all the use the same machinery, what's the point?

The purported value in SLR is its simplicity in implementation; you don't have to scan through the possible reductions checking lookahead sets because there is at most one, and this is the only viable action if there are no SHIFT exits from the state. Which reduction applies can be attached specifically to the state, so the SLR parsing machinery doesn't have to hunt for it. In practice L(AL)R parsers handle a usefully larger set of langauges, and is so little extra work to implement that nobody implements SLR except as an academic exercise.

The difference between LALR and LR has to do with the table *generator*. LR parser generators keep track of all possible reductions from specific states and their precise lookahead set; you end up with states in which every reduction is associated with its exact lookahead set from its left context. This tends to build rather large sets of states. LALR parser generators are willing to combine states if the GOTO tables and lookhead sets for reductions are compatible and don't conflict; this produces considerably smaller numbers of states, at the price of not be able to distinguish certain symbol sequences that LR can distinguish. So, LR parsers can parse a larger set of languages than LALR parsers, but have very much bigger parser tables. In practice, one can find LALR grammars which are close enough to the target langauges that the size of the state machine is worth optimizing; the places where the LR parser would be better is handled by ad hoc checking outside the parser.

So: All three use the same machinery. SLR is "easy" in the sense that you can ignore a tiny bit of the machinery but it is just not worth the trouble. LR parses a broader set of langauges but the state tables tend to be pretty big. That leaves LALR as the practical choice.

Having said all this, it is worth knowing that GLR parsers can parse any context free language, using more complicated machinery *but exactly the same tables* (including the smaller version used by LALR). This means that GLR is strictly more powerful than LR, LALR and SLR; pretty much if you can write a standard BNF grammar, GLR will parse according to it. The difference in the machinery is that GLR is willing to try multiple parses when there are conflicts between the GOTO table and or lookahead sets. (How GLR does this efficiently is sheer genius [not mine] but won't fit in this SO post).

That for me is an enormously useful fact. I build program analyzers and code transformers and parsers are necessary but "uninteresting"; the interesting work is what you do with the parsed result and so the focus is on doing the post-parsing work. Using GLR means I can relatively easily build working grammars, compared to hacking a grammar to get into LALR usable form. This matters a lot when trying to deal to non-academic langauges such as C++ or Fortran, where you literally needs thousands of rules to handle the entire language well, and you don't want to spend your life trying to hack the grammar rules to meet the limitations of LALR (or even LR).

As a sort of famous example, C++ is considered to be extremely hard to parse... by guys doing LALR parsing. C++ is straightforward to parse using GLR machinery using pretty much the rules provided in the back of the C++ reference manual. (I have precisely such a parser, and it handles not only vanilla C++, but also a variety of vendor dialects as well. This is only possible in practice because we are using a GLR parser, IMHO).

[EDIT November 2011: We've extended our parser to handle all of C++11. GLR made that a lot easier to do. EDIT Aug 2014: Now handling all of C++17. Nothing broke or got worse, GLR is still the cat's meow.]

- Difference between back tracking and dynamic programming
- How can we optimize this algorithm from O(N ** 2) to better complexity in order to pass performance test?
- How do I predict the required size of a Base32 Decode output?
- Reversing AND Bitwise
- Why does my binary search need an extra comparison? log2(N)+1
- How to build a trie for finding exact phonetic matches, sorted globally by weight, and paginated? (Building off this example)
- What is the Time Complexity and Space Complexity of extending a string according to a rule?
- Skyscraper puzzle algorithm
- Check if all elements in a list are equal
- Bitwise Interval Arithmetic
- fast algorithm for drawing filled circles?
- How to find distance from the latitude and longitude of two locations?
- Determine if two rectangles overlap each other?
- Randomly Splitting a Graph according to Conditions
- Maximize distance while pushing crates
- Free a binary tree without recursion
- How can I estimate number of nodes in given tree structure?
- Explanation of class Definition for Binary Trees in leetcode
- Procedural Generation of 2D Rooms
- Is there an algorithm to find the closest element to X in an unsorted array in Ω(logN)?
- Advanced Java idiom for 2+ classes implementing a generic interface
- Is there any algorithm in c# to singularize - pluralize a word?
- Number of Common sub sequences of two strings
- Trying to solve problem 19 on Euler Project
- is a "non-decreasing" sequence "increasing"?
- Is it possible to get the original value of a number, after several multiplications **with overflow**?
- Algorithm to determine the highest and lowest possible finishing position of a team in a league
- Algorithm to calculate the number of divisors of a given number
- Rolling or sliding window iterator?
- best way to pick a random subset from a collection?