I wanted to make a reader which reads configuration files similar to INI files for mswin. It is for exercise to teach myself using a lexer/parser generator which I made. The grammar is:
%lexer
HEADER ::= "\\[[0-9a-zA-Z]+\\]"
TRUE ::= "yes|true"
FALSE ::= "no|false"
ASSIGN ::= "="
OPTION_NAME ::= "[a-zA-Z][0-9a-zA-Z]*"
INT ::= "[0-9]+"
STRING ::= "\"(\\\"|[^\"])*\""
CODE ::= "<{(.*)}>"
BLANK ::= "[ \t\f]+" :ignore
COMMENT ::= "#[^\n\r]*(\r|\n)?" :ignore
NEWLINE ::= "\r|\n"
%parser
Options ::= OptionGroup Options | OptionGroup | @epsilon@
OptionGroup ::= HEADER NEWLINE OptionsList
OptionsList ::= Option NEWLINE OptionsList | Option
Option ::= OPTION_NAME ASSIGN OptionValue
OptionValue ::= TRUE | FALSE | INT | STRING | CODE
The problem lies in the @epsilon@
production. I added it because I want my reader to accept also empty files. But I'm getting conflicts when 'OptionsList' or 'OptionGroup' contains an epsilon production. I tried rearrange elements in productions, but I'm only getting conflicts (r/r or s/r, depending of what I did), unless I remove the epsilon completely from my grammar. It removes the problem, but...in my logic one of 'OptionsList' or 'OptionGroup' should contain an epsilon, otherwise my goal to accepting empty files is not met.
My parser generator uses LR(1) method, so I thought I can use epsilon productions in my grammar. It seems I'm good at writing generators, but not in constructing error-less grammars :(.
Should I forget about epsilons? Or is my grammar accepting empty inputs even when there is no epsilon production?
Your Options
production allows an Options
to be a sequence of OptionGroup
s, starting with either an empty list or a list consisting of a single element. That's obviously ambiguous, because a list of exactly one OptionGroup
could be:
OptionGroup
@epsilon@
with the addition of an OptionGroup
.In short, instead of
Options ::= OptionGroup Options | OptionGroup | @epsilon@
you need
Options ::= OptionGroup Options | @epsilon@
which matches exactly the same set of sentences, but unambiguously.
In general terms, you are usually better off writing left-recursive rules for bottom-up parsers. So I would have written
Options ::= Options OptionGroup | @epsilon@