parsinggrammarlemon

Parse grammar alternating and repeating


I was able to add support to my parser's grammar for alternating characters (e.g. ababa or baba) by following along with this question.

I'm now looking to extend that by allowing repeats of characters.

For example, I'd like to be able to support abaaabab and aababaaa as well. In my particular case, only the a is allowed to repeat but a solution that allows for repeating b's would also be useful.

Given the rules from the other question:

expr ::= A | B
A ::= "a" B | "a"
B ::= "b" A | "b"

... I tried extending it to support repeats, like so:

expr ::= A | B

# support 1 or more "a"
A_one_or_more = A_one_or_more "a" | "a"

A ::= A_one_or_more B | A_one_or_more
B ::= "b" A | "b"

... but that grammar is ambiguous. Is it possible for this to be made unambiguous, and if so could anyone help me disambiguate it?

I'm using the lemon parser which is an LALR(1) parser.


Solution

  • The point of parsing, in general, is to parse; that is, determine the syntactic structure of an input. That's significantly different from simply verifying that an input belongs to a language.

    For example, the language which consists of arbitrary repetitions of a and b can be described with the regular expression (a|b)*, which can be written in BNF as

    S ::= /* empty */  | S a | S b
    

    But that probably does not capture the syntactic structure you are trying to defind. On the other hand, since you don't specify that structure, it is hard to know.

    Here are a couple more possibilities, which build different parse trees:

    S ::= E | S E
    E ::= A b | E b
    A ::= a | A a
    

    S ::= E | S E
    E ::= A B
    A ::= a | A a
    B ::= b | B b
    

    When writing a grammar to parse a language, it is useful to start by drawing your proposed parse trees. Usually, you can write the grammar directly from the form of the trees, which shows that a formal grammar is primarily a documentation tool, since it clearly describes the language in a way that informal descriptions cannot. Using a parser generator to turn that grammar into a parser ensures that the parser implements the described language. Or, at least, that is the goal.