parsing compiler-construction bison yacc shift-reduce-conflict

Eliminate shift/reduce conflicts caused by production rules with same prefix

Here is the simplified yaac file:

%token CONTEXT_    // the corresponding string is "context"
%token CONTEXTREF_ //"contextref"
%token IS_         //"is"
%token ID_L        //"id_l"
%token ID_L1       //"id_l1"
%token LIB_

%start design_file

%%
design_file   :design_unit
              |design_file design_unit 
              ;

design_unit   :context_cl LIB_
              |context_decl
              ;

context_cl    : /* empty */  {  }
              |context_cl context_ref
              ;

context_decl  :CONTEXT_ ID_L IS_ ';'
              ;

context_ref   :CONTEXT_ ID_L1 '.' ID_L ';'
              ;

There are 2 shift/reduce conflicts.

CONTEXT_  shift, and go to state 1

CONTEXT_  [reduce using rule 5 (context_cl)]

Rule 5 is context_cl : /* empty */ { }.

Generally, this doesn't matter, the parser works well most of the time. But in one strange case, for a source file like following:

context id_l is ...
...

The parser will not shift but reduce, hence using context_ref :CONTEXT_ ID_L1 '.' ID_L ';' to parse context id_l is ..., which causes syntax errors.

So I have to eliminate the conflicts. I guess they are caused by rule context_decl and context_ref both having CONTEXT_ in the beginning.

To validate my guess, I just changed context_ref :CONTEXT_ ID_L1 '.' ID_L ';' to context_ref :CONTEXTREF_ ID_L1 '.' ID_L ';'. Then, there is no conflict and there is no syntax error.

However, I can't replace context_ref this way. This is the standard.

So, how can I avoid these 2 conflicts by other means? Or, is there a general method to tackle conflicts of this kind?

And, for shift/reduce conflicts, will the parser choose to reduce rather than shift? Because I think the parser will always shift over reduce.

Solution

The basic pattern which gives rise to this conflict is:

union : x x_rest
      | y_list y_rest
y_list: /* empty */
      | y_list y

where x and y both have a production starting with the same symbol.

It's worth noting that the problem will vanish if y_list is one-or-more ys, instead of zero or more:

y_list: y
      | y_list y

as long as x and y are really distinguishable. (There are some other requirements, depending on the rest of the production, but basically it's OK if they have the same prefix. They could even be the same provided that the continuations differ in the first token.)

If you really need y_list to be potentially empty (or both x and y to be potentially empty), you should do the usual null-production elimination, which involves factoring the potentially empty production out of the grammar:

union : x x_rest
      | y_rest           /* Corresponds to an empty y_list */
      | y_list y_rest
y_list: y                /* As above, this list cannot be empty */
      | y_list y