bisonyacclex

Resolving shift/reduce conflict in Cypher parser caused by SET clause defined in multiple rules


I'm working on a project to support Cypher clauses in Postgres psql. All PostgreSQL and Cypher commands will pass through Cypher's parser. For more details, see the following Q&A: How to differentiate a Cypher clause from an SQL clause in C?. In some cases, to build Cypher clauses, I need PostgreSQL-like rules. For example, the SET rule:

set_clause:
     SET assign_list { set = true; }
     | set_clause SET assign_list
     ;

This rule is hidden to avoid conflicts so that the calling order of the rules for the MATCH clause, for example, is as follows:

%start query

query:
...
match_clause
...
;

match_clause:
     optional_opt MATCH pattern_list set_clause_opt
     ;

set_clause_opt:
     /* empty */
     | set_clause
     ;

set_clause:
     SET assign_list { set = true; }
     | set_clause SET assign_list
     ;

That is, the SET command only works in conjunction with another Cypher clause, such as MATCH. Otherwise, this will be a PostgreSQL command and will not be recognized by Cypher's parser. The problem is that there is a SET GRAPH command in the language I'm implementing. This command is defined in a separate rule, as the example below. This is causing some shift/reduce warnings because of the SET token definition in different rules.

SET GRAPH command:

query:
...
match_clause
| set_graph_clause
...
;

set_graph_clause:
     SET graph_option EQUALS graph_path_list { graph_name = $4; }
     ;

graph_option:
     GRAPH
     | GRAPH_PATH
     ;

Considering that I need set_graph_clause to be inside the query rule and that set_clause is just combined with other rules, how to resolve this conflict?


Solution

  • The problem is that you need two tokens of lookahead to see whether the SET is followed by a GRAPH or GRAPH_PATH token to known whether a SET on the lookahead is part of a set_clause that should be included with the current match_clause or is a set_graph that should be its own clause in the query.

    There are two basic approaches to making this work with an LR(1) parser

    1. Don't try to combine the set clauses in the parser. Instead parse the query as a sequence of (potentially) independent clauses, and then after parsing go through the list and combine the set-clauses with the immediately preceeding match-clause. You can give a good error message at this stage if you see a set-clause in the "wrong" place.

    2. Have your lexer tokenize SET GRAPH and SET GRAPH_PATH as single tokens. This way in the parser it is only one token and fits in the lookahead. This may be complex if you can have comments between the SET and the GRAPH

    Of course, there's a third option of using a more powerful parsing scheme like GLR or backtracking for more lookahead.