parsingocamlshift-reduce-conflictmenhir

Why do I always have a shift/reduce conflict with my EOF / linebreak nonterminals?


So, I'm still pretty junior when it comes to putting together parsing grammars. I need help dissecting conflicts reported by Menhir when I have shift-reduce conflicts.

Take this little grammar as an example:

(* {2 Tokens } *)
%token EOF
%token COLON PIPE SEMICOLON
%token <string> COUNT
%token <string> IDENTIFIER

%start <AST.t> script
%start <AST.statement> statement

%%
(* {2 Rules } *)

script:
 | it = separated_list(break, statement); break?; EOF { { statements = it } }
 ;

statement:
 | COLON*; count = COUNT?; cmd = command { AST.make_statement ~count ~cmd }
 ;

command:
 | it = IDENTIFIER { it }
 ;

break:
 | SEMICOLON { }
 ;

%%

Menhir's --explain flag produces this description of the resulting shift/reduce conflict. Unfortunately, I can't make heads nor tails of it:

** Conflict (shift/reduce) in state 3.
** Token involved: SEMICOLON
** This state is reached from script after reading:

statement 

** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)

script 
(?)

** In state 3, looking ahead at SEMICOLON, shifting is permitted
** because of the following sub-derivation:

loption(separated_nonempty_list(break,statement)) option(break) EOF 
separated_nonempty_list(break,statement) 
statement break separated_nonempty_list(break,statement) 
          . SEMICOLON 

** In state 3, looking ahead at SEMICOLON, reducing production
** separated_nonempty_list(break,statement) -> statement 
** is permitted because of the following sub-derivation:

loption(separated_nonempty_list(break,statement)) option(break) EOF // lookahead token appears because option(break) can begin with SEMICOLON
separated_nonempty_list(break,statement) // lookahead token is inherited
statement . 

I've spent the evening trying to dig through documentation of what a shift/reduce conflict actually is, but I have to admit I'm having a really hard time making any sense of what I'm reading. Can somebody give me a simple (well, as much as possible) explanation of shift/reduce conflicts? Specifically using the context of the above example?


Solution

  • The problem is that, upon looking at a SEMICOLON, the parser can not decide if it should expect EOF or the rest of the list. The reason is that you use break as an optional terminator, instead of a separator.

    I propose you change your main rule:

    script:
     | it = optterm_list(break, statement); EOF { { statements = it } }
     ;
    

    And define the optterm_list combinator yourself, with something like this:

    optterm_list(separator, X):
      | separator? {[]}
      | l=optterm_nonempty_list(separator, X) { l } 
    optterm_nonempty_list(separator, X):
      | x = X separator? { [ x ] }
      | x = X
        separator
        xs = optterm_nonempty_list(separator, X)
         { x :: xs }