parsingcompilationcompiler-constructionlr-grammarlalr# Can LR(1) parsing table size be the same as LALR(1) parsing table for some grammar?

I know that LALR(1) grammars are a subset of LR(1) grammars and most of the time LALR(1) parsing table is much smaller than LR(1) parsing table for the same grammar. But I couldn't find the answer on the internet and SO whether a grammar exists for which they have the same parsing table size. It sounds possible, since LALR is essentially "collapsing" compatibile states, merging those that do not conflict, but is there a grammar both LR(1) and LALR(1) with same parsing table size for both?

Solution

Intuitively, an LALR(1) parser is formed by starting with an LR(1) parser and repeatedly merging states together when those states are identical with the exception of lookaheads. Therefore, an LR(1) and LALR(1) parser for a grammar will be the same whenever there aren't any states of this sort to merge together.

In that case, the parsing tables for the two parsers will be completely identical. The GOTO tables will be the same because the two parsers have the same states and same transitions, and the ACTION tables will be the same because the shift and reduce items in each state are the same.

I believe that the specific requirement that has to hold for the LALR(1) and LR(1) automata to be the same is that the grammar's LR(0) and LR(1) parsers must be identical to one another, ignoring lookaheads. Specifically:

- If the LR(0) and LR(1) parsers are the same ignoring lookaheads, then there are no states to combine together in the LR(1) parser, so the LR(1) parser is the same as the LALR(1) parser.
- If the LALR(1) parser and LR(1) parser are the same, then no states in the LR(1) parser would have been combined together. Since the number of states in the LALR(1) parser for a grammar is always the same as the number of LR(0) states, this means that the LR(1) and LR(0) parsers have the same number of states. That means the only way they can differ is in the lookaheads.

So in other words, yes, this is possible. More precisely:

: A grammar G has identical LALR(1) and LR(1) parsers if and only if it has identical LR(0) and LR(1) parsers, ignoring lookaheads.Theorem

This means that any grammar that has this property must be LR(0), in which case you wouldn't want to use an LALR(1) parser. However, not all LR(0) grammars have this property. For example, consider this grammar:

```
S -> aTbT
T -> c
```

Here's the LR(1) parser:

```
(1)
S' -> .S [$]
S -> .aTbT [$]
(2)
S -> a.TbT [$]
T -> .c [b]
(3)
T -> c. [b]
(4)
S -> aT.bT [$]
(5)
S -> aTb.T [$]
T -> .c [$]
(6)
T -> c. [$]
(7)
S -> aTbT. [$]
(8)
S' -> S [$]
```

Here, states (3) and (6) would be combined together in an LALR(1) parser, so the LR(1) and LALR(1) parsers aren't the same. However, you can check that this grammar is indeed LR(0), showing that only a subset of the LR(0) grammars have this property.

- Azure Synapse CSV Parsing Unexpected Token
- Grab the values from url that is in between specific characters BigQuery
- I Need a Human Readable, Yet Parse-able Document Format
- Java Ant <Date> task exception: Cannot be parsed correctly. It should be in 'MM/dd/yyyy hh:mm a' format
- What are the pros and cons of the leading Java HTML parsers?
- Firestore JSON parsing terminated
- Why is string.Empty more recommended than ""?
- How to escape braces (curly brackets) in a format string in .NET
- How to strip all non-alphabetic characters from string in SQL Server?
- Tree of all dependencies in a SQL Server database
- android parsing negative number strings
- Using ANTLR 3.3?
- Date value wrongly formatted
- Parsing JSON array into java.util.List with Gson
- @username Regular Expression for social media with JavaScript
- Content Security Policy report-uri is not being recognized
- Python Json loads() returning string instead of dictionary?
- Parsing JSON in Excel VBA
- SQL: parse the first, middle and last name from a fullname field
- parse URL with JavaScript or jQuery
- Dump all key pair of a json with JQ
- Read data from a txt file into 2D array python
- Blending a LaTeX document with python package pylatexenc
- Unpack / pack operator
- DateTime.TryParseExact constantly returns false
- How to check that a string is parseable to a double?
- Python urlparse -- extract domain name without subdomain
- How to consume W3C EBNF-Notation and produce a parser generator?
- DurationPattern can't parse without separator
- try to parse a simple "\s*identifier\s+identifier\s+identifier\s*" string