I am writing a JISON file for a class and trying to use nonterminals in place of declaring associativity for operators but am utterly lost on what the errors really mean, as this is a one time assignment for a class and I haven't found amazing examples of using nonterminals for this use case.
My JISON code:
/* lexical grammar */
%lex
%%
\s+ /* skip whitespace */
[0-9]+("."[0-9]+)?\b return 'NUMBER'
"*" return '*'
"/" return '/'
"-" return '-'
"+" return '+'
"^" return '^'
"!" return '!'
"%" return '%'
"(" return '('
")" return ')'
"PI" return 'PI'
"E" return 'E'
<<EOF>> return 'EOF'
. return 'INVALID'
/lex
%start expressions
%% /* language grammar */
expressions
: e EOF
{ typeof console !== 'undefined' ? console.log($1) : print($1);
return $1; }
;
e
: NegExp
{$$ = $1;}
| MulExp
{$$ = $1;}
| PowExp
{$$ = $1;}
| UnaryExp
{$$ = $1;}
| RootExp
{$$ = $1;}
;
RootExp
: ’(’ RootExp ’)’
{$$ = ’(’ + $2 + ’)’;}
| NUMBER
{$$ = Number(yytext);}
| E
{$$ = ’E’;}
| PI
{$$ = ’PI’;}
;
UnaryExp
: UnaryExp '!'
{$$ = '(' + $1 + '!' + ')';}
| UnaryExp '%'
{$$ = '(' + $1 + '%' + ')';}
| '-' UnaryExp
{$$ = '(-' + $2 + ')';}
;
NegExp
: NegExp '+' e
{$$ = '(' + $1 + ' + ' + $3 + ')';}
| NegExp '-' e
{$$ = '(' + $1 + ' - ' + $3 + ')';}
;
MulExp
: MulExp '*' e
{$$ = '(' + $1 + ' * ' + $3 + ')';}
| MulExp '/' e
{$$ = '(' + $1 + ' / ' + $3 + ')';}
;
PowExp
: e '^' PowExp
{$$ = '(' + $1 + ' ^ ' + $3 + ')';}
;
And when I run jison filename.jison
I get a slew of errors like:
Conflict in grammar: multiple actions possible when lookahead token is ^ in state 26
- reduce by rule: MulExp -> MulExp / e
- shift token (then go to state 13)
and:
States with conflicts:
State 3
e -> NegExp . #lookaheads= EOF ^ + - * /
NegExp -> NegExp .+ e #lookaheads= EOF + - ^ / *
NegExp -> NegExp .- e #lookaheads= EOF + - ^ / *
Again, I am not looking for someone to do my homework for me, but pointers on where to go or what to do to help debug will be greatly appreciated.
It's true; it is not easy to find examples of expression grammars which resolve ambiguity without using precedence declarations. That's probably because precedence declarations, in this particular use case, are extremely convenient and probably more readable than writing out an unambiguous grammar. The resulting parser is usually slightly more efficient, as well, because it avoids the chains of unit reductions imposed by the usual unambiguous grammar style.
The flip side of this convenience is that it does not help the student achieve an understanding of how grammars actually work, and without that understanding it is very difficult to apply precedence declarations to less clear-cut applications. So the exercise which gave rise to this question is certainly worthwhile.
One place you will find unambiguous expression grammars is in the specifications of (some) programming languages, because the unambiguous grammar does not depend on the precise nature of the algorithm used by parser generators to resolve parsing conflicts. These examples tend to be rather complicated, though, because real programming languages usually have a lot of operators. Even so, the sample C grammar in the jison examples directory does show the standard model for arithmetic expression grammars. The following extract is dramatically simplified, but most of the productions were simply copy-and-pasted from the original. (I removed many operators, most of the precedence levels, and some of the complications dealing with things like cast expressions and the idiosyncratic comma operator, which are surely not relevant here.)
primary_expression
: IDENTIFIER
| CONSTANT
| '(' expression ')'
;
postfix_expression
: primary_expression
| postfix_expression INC_OP
| postfix_expression DEC_OP
;
unary_expression
: postfix_expression
| '-' unary_expression
| INC_OP unary_expression
| DEC_OP unary_expression
;
/* I added this for explanatory purposes. DO NOT USE IT. See the text. */
exponential_expression
: unary_expression
| unary_expression '^' exponential_expression
multiplicative_expression
: exponential_expression
| multiplicative_expression '*' exponential_expression
| multiplicative_expression '/' exponential_expression
| multiplicative_expression '%' exponential_expression
;
additive_expression
: multiplicative_expression
| additive_expression '+' multiplicative_expression
| additive_expression '-' multiplicative_expression
;
expression
: additive_expression
;
C does not have an exponentiation operator, so I added one with right associativity and higher precedence than multiplication, which will serve for explanatory purposes. However, your assignment probably wants it to have higher precedence than unary negation as well, which I didn't do. So I don't recommend using the above directly.
One thing to note in the above model is that every precedence level corresponds to a non-terminal. These non-terminals are linked into an ordered chain using unit productions. We can see this sequence:
expression ⇒ additive_expression ⇒ multiplicative_expression ⇒ exponential_expression ⇒ unary_expression ⇒ postfix_expression ⇒ primary_expression
which indeeds corresponds to the precedence ordering of this grammar.
The other interesting aspect of the grammar is that the left-associative operators (all of them except exponentiation) are implemented with left-recursive productions, while the right-associative operator is implemented with a right-recursive production. This is not a coincidence.
That's the basic model, but it's worth spending a few minutes to try to understand how it actually works, because it turns out to be pretty simple. Let's take a look at one production, for multiplication, and see if we can understand why it implies that exponentiation binds more tightly and addition binds less tightly:
multiplicative_expression: multiplicative_expression '*' exponential_expression
This production is saying that a multiplicative_expression
consists of a *
with a multiplicative_expression
on the left and an exponential_expression
on the right.
Now, what does that mean for 2 + 3 * 4 ^ 2
? 2 + 3
is an additive_expression
, but we can see from the chain of unit productions that multiplicative_expression
does not produce additive_expression
. So the grammar does not include the possibility that 2 + 3
is the phrase matched on the left-hand side of the *
. However, it is perfectly legal for 3
(a CONSTANT
, which is a primary_expression
) to be parsed as the left-hand operand of the multiplication.
Meanwhile, 4 ^ 2
is an exponential_expression
, and our production clearly indicates that an exponential_expression
can be matched on the right of the *
.
A similar argument, examining the addition and exponential expression productions, would show that 3 * 4 ^ 2
(a multiplicative_expression
) can be on the right-hand side of the +
operator, while neither 2 + 3 * 4
(an additive_expression
) nor 3 * 4
(a multiplicative_expression
) can be on the left-hand side of the exponentiation operator.
In other words, this simple grammar defines precisely and unambiguously how the expression must be decomposed. There is only one possible parse tree:
expression
|
add
|
+--------+----------------+
| | |
| | mult
| | |
| | +-------+---------------+
| | | | |
| | | | power
| | | | |
| | | | +-------+-------+
| | | | | | |
add | mult | unary | power
... | ... | ... | ...
| | | | | | |
primary | primary | primary | primary
| | | | | | |
2 + 3 * 4 ^ 2
I hope that helps somewhat.