compiler-constructionabstract-syntax-treepretty-printambiguous-grammar

Pretty printing ambiguous grammar


I have implemented parser combinators, that can parse grammars that may contain ambiguity. An error is given when the grammar is ambiguous, but going in the other direction is proving to be more difficult. The question is how to pretty print an abstract syntax tree to a potentially ambiguous grammar with a minimal number of parentheses. Using operator precedence levels helps but is not a panacea. Inside the same precedence level, the problem persists.

The exact operators are not known until runtime and can change during execution when the user introduces a new operator. I have support for prefix, postfix, and infix (left, right, and non-associative) operators. Infix left and postfix operators mix at a precedence level at the same time. The same applies to infix right and prefix operators. The operators can also embed full expressions thus if-then-else and if-then could both be implemented as prefix operators. (although it might not be a smart move.)

Here is an example using the mentioned if-then-else and if-then operators, that are here assumed to be at the same precedence level. Obviously, the expression if a then if b then c else d is ambiguous as it can be interpreted as if a then (if b then c) else d or if a then (if b then c else d). During pretty-printing, the algorithm should know to use parentheses even though both operators are at the same precedence level and have compatible associativity (to the right).

A cautionary example: Add another prefix operator say inc of the same precedence as if-then-else and if-then. Now assume an arbitrary set P ⊂ H x O where H is the set of operator holes and O is the set of operators. The set is meant to be a relation that tells when parentheses need to be added. Examine the expressions if a then inc b else c and if a then (inc if b then c) else d. The first requires (if-then-else.2, inc) to not be in P and the second requires the opposite. This contradicts the assumption the problem can be solved by some kind of relation or order. One could try to say let (inc.1, if-then) be in P making the latter expression if a then inc (if b then c) else d, but then inc if a then b becomes inc (if a then b) which has too many parentheses.

To my knowledge, the grammar is context-free. I'm a little shaky on the definition though.

The parser is loosely based upon a paper here. I am using Haskell.

Update: As demonstrated by Maya, the problem is insolvable in general. I would be willing to accept an algorithm that may fail. If even that is not enough to make things practical, a good heuristic will do.


Solution

  • In full generality, this is impossible. Consider the operators A_B, _C_, A_C_B. The expression A_C_B 1 2 (i.e. A 1 C 2 B) is impossible to parenthesize such that it cannot be parsed as A (1 C 2) B.