I am developing a parser for a specific DSL language. Below is a working (but incomplete) grammar:
%{
package main
%}
//%type<stmts> stmts
%type<stmt> stmt
//%type<stmt_if> stmt_if
//%type<exprs> exprs
//%type<compstmt> compstmt
%union {
token Token
terminal Token
//terms Token
stmt_if statement // оператор Если
stmts statements
stmt statement
compstmt statements
}
%token<token> Directive Identifier Procedure Var EndProcedure If Then ElseIf Else EndIf For Each In To Loop EndLoop Break Not
%token<token> Continue Try Catch EndTry Number String Array Function EndFunction Return Throw NeEq Or And True False Undefind Export
%right '='
%left Or
%left And
%left Identifier
//%nonassoc NeEq '>' '<'
%nonassoc NeEq
%left '>' '<'
%left '+' '-'
%left '*' '/' '%'
%%
main: opt_directive invoke {}
;
opt_directive: /* none */
| Directive EOL {}
;
opt_export: /* none */
| Export {}
;
invoke: Function Identifier '(' exprs ')' opt_export opt_stmt EndFunction opt_terms {}
| Procedure Identifier '(' exprs ')' opt_export opt_stmt EndProcedure opt_terms {}
;
opt_stmt : opt_terms{}
| stmts opt_terms{}
;
stmts : stmt {}
| EOL stmt {}
| stmts terminals stmt{}
//| stmts semi stmt {}
;
stmt : Return expr {}
| Throw throw_terms {}
| Break { }
| Continue { }
| expr {}
| stmt_if {}
;
throw_terms : terminal {}
| String terminal {}
;
stmt_if : If expr Then opt_stmt EndIf terminal
{
}
;
expr : Identifier {}
| '(' expr ')' { }
| expr '+' expr { }
| expr '-' expr { }
| expr '*' expr { }
| expr '/' expr { }
| expr '%' expr {}
| String {}
| Number { }
| True { }
| False { }
| Undefind {}
| expr '>' expr { }
| expr '<' expr{}
| expr '=' expr {}
| Identifier '=' expr {}
| expr NeEq expr {}
| expr And expr { }
| expr Or expr {}
| Identifier '(' exprs ')' {}
;
exprs : { }
| expr {}
| exprs ',' opt_terms expr {}
;
opt_terms : /* none */
| terminals
;
terminals : terminal { }
| terminals terminal { }
;
terminal : semi { }
| EOL { }
;
EOL : '\n';
semi: ';';
%%
If I change terminal
to terminals
in the stmt_if
rule, I get a conflict
stmt_if : If expr Then opt_stmt EndIf terminals
{
}
;
I need to use terminals
because the language syntax allows statements like:
If 1 = 1 Then
// ...
EndIf;
as well as:
If 1 = 1 Then
// ...
EndIf
;
I don't understand why there is ambiguity here, as 'terminals' should accommodate the optional newline and semicolon. How can I write a rule that allows for multiple spaces and a semicolon at the end without ambiguity?
Thanks in advance for your help!
The basic problem is that by having terminals
part of both stmt_if
and stmts
, it is ambiguous as which any give terminal should be associated with. It is made more complicated by the fact that terminals
can only be between two stmt
in a stmts
(not at the end), so any terminals
at the end must be associated with the stmt_if
(or cause an error if the last stmt
is not a stmt_if
.
Now in your case, you probably don't care which stmt or stmts any given terminal is associated with, as you're just going to throw them away. What you do care about is allowing them anywhere they should be allowed and disallowing them anywhere they should not be (and what those are is not clear from your grammar -- should they not be allowed after the last stmt
in a function or procedure unless that is an if? That is what it appears you are doing currently)
The easiest fix is to pick one place in the grammar to put them and put them (only) there. For example, you might put them in the stmt
rules -- each kind of stmt
would end with terminals
or opt_terminals
(depending on whether at least one terminal is required or not for that statement). But this would then allow extra terminals at the end of a procedure/function body which you might not want.