parsingshift-reduce-conflictreduce-reduce-conflictlr1

LR(1) Parser: Why adding an epsilon production makes r/r or s/r conflicts


I wanted to make a reader which reads configuration files similar to INI files for mswin. It is for exercise to teach myself using a lexer/parser generator which I made. The grammar is:

%lexer
HEADER ::= "\\[[0-9a-zA-Z]+\\]"
TRUE ::= "yes|true"
FALSE ::= "no|false"
ASSIGN ::= "="
OPTION_NAME ::= "[a-zA-Z][0-9a-zA-Z]*"
INT ::= "[0-9]+"
STRING ::= "\"(\\\"|[^\"])*\""
CODE ::= "<{(.*)}>"
BLANK ::= "[ \t\f]+" :ignore
COMMENT ::= "#[^\n\r]*(\r|\n)?" :ignore
NEWLINE ::= "\r|\n"
%parser
Options ::= OptionGroup Options | OptionGroup | @epsilon@
OptionGroup ::= HEADER NEWLINE OptionsList 
OptionsList ::= Option NEWLINE OptionsList | Option 
Option ::= OPTION_NAME ASSIGN OptionValue
OptionValue ::= TRUE | FALSE | INT | STRING | CODE 

The problem lies in the @epsilon@ production. I added it because I want my reader to accept also empty files. But I'm getting conflicts when 'OptionsList' or 'OptionGroup' contains an epsilon production. I tried rearrange elements in productions, but I'm only getting conflicts (r/r or s/r, depending of what I did), unless I remove the epsilon completely from my grammar. It removes the problem, but...in my logic one of 'OptionsList' or 'OptionGroup' should contain an epsilon, otherwise my goal to accepting empty files is not met.

My parser generator uses LR(1) method, so I thought I can use epsilon productions in my grammar. It seems I'm good at writing generators, but not in constructing error-less grammars :(.

Should I forget about epsilons? Or is my grammar accepting empty inputs even when there is no epsilon production?


Solution

  • Your Options production allows an Options to be a sequence of OptionGroups, starting with either an empty list or a list consisting of a single element. That's obviously ambiguous, because a list of exactly one OptionGroup could be:

    In short, instead of

    Options ::= OptionGroup Options | OptionGroup | @epsilon@
    

    you need

    Options ::= OptionGroup Options | @epsilon@
    

    which matches exactly the same set of sentences, but unambiguously.

    In general terms, you are usually better off writing left-recursive rules for bottom-up parsers. So I would have written

    Options ::= Options OptionGroup | @epsilon@