I'm trying to extent the SQL language of SQLite at one point (file parse.y). I have a parsing conflict, however the lemon parser does not show anything besides a random "1 parsing conflicts." error message.
The problem is located where create_table can be reduced to both "CREATE" or "CREATE OR REPLACE" which is followed by temp which can also be reduced to an empty token.
cmd ::= create_table create_table_args table_properties_args.
create_table ::= createorreplace(C) temp(T) TABLE ifnotexists(E) nm(X) dbnm(Y). {
// ...
}
%type createorreplace {int}
createorreplace(A) ::= CREATE. {disableLookaside(pParse); A = 0;}
createorreplace(A) ::= CREATE OR REPLACE. {disableLookaside(pParse); A = 1;}
%type temp {int}
temp(A) ::= TEMP. {A = pParse->db->init.busy==0;}
temp(A) ::= . {A = 0;}
How can I make "OR REPLACE" reduced optionally, while preserving that it may be followed by TEMP?
Since I can only guess how and where you might have changed SQLite's SQL grammar, this answer is necessarily somewhat tentative. But it might be useful anyway.
The original SQL grammar contains the following productions (I left out the actions since they are never relevant in diagnosing conflicts):
cmd ::= create_table create_table_args.
create_table ::= createkw temp(T) TABLE
ifnotexists(E) nm(Y) dbnm(Z).
createkw(A) ::= CREATE(A).
temp(A) ::= TEMP.
temp(A) ::= .
cmd ::= createkw(X) temp(T) VIEW
ifnotexists(E) nm(Y) dbnm(Z) eidlist_opt(C) AS select(S).
You seem to have modified create_table
to instead read:
create_table ::= createorreplace(C) temp(T) TABLE
ifnotexists(E) nm(X) dbnm(Y).
createorreplace(A) ::= CREATE.
createorreplace(A) ::= CREATE OR REPLACE.
That change indeed creates a conflict, but it has nothing to do with temp
being nullable. In fact, it has very little to do with the non-terminal temp
at all. You could replace temp
with TEMP
(thereby making it obligatory rather than optional) and you would still have a shift-reduce conflict.
The conflict occurs for inputs which start CREATE TEMP
. That input could be the start of
CREATE TEMP TABLE ...
CREATE TEMP VIEW ...
Those are obviously different syntaxes, and there is no ambiguityBut when the terminal CREATE
has just been read and the terminal TEMP
is the lookahead token, both of those possibilities are still available. That's not necessarily a problem; a bottom-up parser does not need to resolve which possible production will be used until it gets to the end of the production. So the original grammar works fine, without conflicts.
But note that the original grammar does not have a cmd
production which starts with the terminal CREATE
. What it has are several cmd
productions which start with the non-terminal createkw
. But there is no possibility of confusion there, either. The terminal CREATE
is reduced to createkw
in both cmd
productions (and other cmd
productions I didn't list, which also start with createkw
).
However, in your modified grammar, the two productions do not both start with createkw
. One of them was changed to start with createorreplace
.
Inputs which do not include the optional keyword TEMP
still parse without any problem. If TEMP
is not present, the lookahead token will be TABLE
in the create_table
command, and the lookahead token will be VIEW
in the create view command. Since the lookahead tokens differ, the parser has no trouble deciding whether to reduce to createkw
or to reduce to createorreplace
. Similarly, if the input were actually CREATE OR REPLACE ...
, the lookahead token would be OR
, which unambiguously forces a reduction to createorreplace
.
But the problematic input, as shown above, starts CREATE TEMP
. Now, the parser must decide, without seeing anything which follows the terminal TEMP
, whether to reduce CREATE
to createkw
or to reduce it to createorreplace
. Since that determination cannot be made, a conflict is reported. (And you'll find a lot more information about that conflict by looking through the Lemon report file, parse.out
.)
The solution (if my guess about your grammar modifications was correct) is to avoid forcing the parser to make an unnecessary decision. That requires a little bit of grammar duplication:
cmd ::= create_table create_table_args.
create_table ::= createkw temp(T) TABLE ifnotexists(E) nm(Y) dbnm(Z).
create_table ::= createorreplace temp(T) TABLE ifnotexists(E) nm(Y) dbnm(Z).
createkw(A) ::= CREATE(A).
createorreplace(A) ::= CREATE OR REPLACE.
temp(A) ::= TEMP.
temp(A) ::= .
cmd ::= createkw(X) temp(T) VIEW ifnotexists(E) nm(Y) dbnm(Z)
eidlist_opt(C) AS select(S).
Now, the terminal CREATE
not followed by OR REPLACE
is always reduced to createkw
, while the sequence CREATE OR REPLACE
is always reduced to createorreplace
. This works because there is no possible parse for a cmd
starting CREATE OR
, other than CREATE OR REPLACE
.