I'm working on a project to support Cypher clauses in Postgres psql. All PostgreSQL and Cypher commands will pass through Cypher's parser. For more details, see the following Q&A: How to differentiate a Cypher clause from an SQL clause in C?. In some cases, to build Cypher clauses, I need PostgreSQL-like rules. For example, the SET rule:
set_clause:
SET assign_list { set = true; }
| set_clause SET assign_list
;
This rule is hidden to avoid conflicts so that the calling order of the rules for the MATCH clause, for example, is as follows:
%start query
query:
...
match_clause
...
;
match_clause:
optional_opt MATCH pattern_list set_clause_opt
;
set_clause_opt:
/* empty */
| set_clause
;
set_clause:
SET assign_list { set = true; }
| set_clause SET assign_list
;
That is, the SET command only works in conjunction with another Cypher clause, such as MATCH. Otherwise, this will be a PostgreSQL command and will not be recognized by Cypher's parser. The problem is that there is a SET GRAPH command in the language I'm implementing. This command is defined in a separate rule, as the example below. This is causing some shift/reduce warnings because of the SET token definition in different rules.
SET GRAPH command:
query:
...
match_clause
| set_graph_clause
...
;
set_graph_clause:
SET graph_option EQUALS graph_path_list { graph_name = $4; }
;
graph_option:
GRAPH
| GRAPH_PATH
;
Considering that I need set_graph_clause
to be inside the query rule and that set_clause
is just combined with other rules, how to resolve this conflict?
The problem is that you need two tokens of lookahead to see whether the SET
is followed by a GRAPH
or GRAPH_PATH
token to known whether a SET
on the lookahead is part of a set_clause
that should be included with the current match_clause
or is a set_graph
that should be its own clause in the query.
There are two basic approaches to making this work with an LR(1) parser
Don't try to combine the set clauses in the parser. Instead parse the query as a sequence of (potentially) independent clauses, and then after parsing go through the list and combine the set-clauses with the immediately preceeding match-clause. You can give a good error message at this stage if you see a set-clause in the "wrong" place.
Have your lexer tokenize SET GRAPH
and SET GRAPH_PATH
as single tokens. This way in the parser it is only one token and fits in the lookahead. This may be complex if you can have comments between the SET
and the GRAPH
Of course, there's a third option of using a more powerful parsing scheme like GLR or backtracking for more lookahead.