I'm trying to implement a grammar that allows multiplication by juxtaposition. This is for parsing polynomial inputs for a CAS.
It works quite well, except few edge cases, as far as I'm aware of. There are two problems I have identified:
a^2 b
is (erroneously) parsed as (^ a (* 2 b))
, not as (* (^ a 2) b)
.28 shift/reduce conflicts
and 8 reduce/reduce conflicts
.I'm pretty sure properly resolving the first issue will resolve the second as well, but so far I haven't been successful.
The following is the gist of the grammar that I'm working with:
%start prgm
%union {
double num;
char *var;
ASTNode *node;
}
%token <num> NUM
%token <var> VAR
%type <node> expr
%left '+' '-'
%left '*' '/'
%right '^'
%%
prgm: // nothing
| prgm '\n'
| prgm expr '\n'
;
expr: NUM
| VAR
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr '^' expr
| expr expr %prec '*'
| '-' expr
| '(' expr ')'
;
%%
Removing the rule for juxtaposition (expr expr %prec '*'
) resolves the shift/reduce & reduce/reduce warnings.
Note that ab
in my grammar should mean (* a b)
.
Multi-character variables should be preceded by a quote('
); this is already handled fine in the lex
file.
The lexer ignores spaces(
) and tabs(\t
) entirely.
I'm aware of this question, but the use of juxtaposition here does not seem to indicate multiplication.
Any comments or help would be greatly appreciated!
P.S. If it helps, this is the link to the entire project.
As indicated in the answer to the question you linked, it is hard to specify the operator precedence of juxtaposition because there is no operator to shift. (As in your code, you can specify the precedence of the production expr: expr expr
. But what lookahead token will this reduction be compared with? Adding every token in FIRST(expr) to your precedence declarations is not very scalable, and might lead to unwanted precedence resolutions.
An additional problem with the precedence solution is the behaviour of the unary minus operator (an issue not addressed in the linked question), because as written your grammar allows a - b
to be parsed either as a subtraction or as the juxtaposed multiplication of a
and -b
. (And note that -
is in FIRST(expr), leading to one of the possibly unwanted resolutions I referred to above.)
So the best solutions, as recommended in the linked question, is to use a grammar with explicit precedence, such as the following: (Here, I used juxt
as the name of the non-terminal, rather than expr_sequence
):
%start prgm
%token NUM
%token VAR
%left '+' '-'
%left '*' '/'
%right '^'
%%
prgm: // nothing
| prgm '\n'
| prgm expr '\n'
expr: juxt
| '-' juxt
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr '^' expr
juxt: atom
| juxt atom
atom: NUM
| VAR
| '(' expr ')'
This grammar may not be what you want:
-xy
into -(xy)
instead of (-x)y
, but it's not ideal. Also, it doesn't allow --x
(also, probably not a problem but not ideal). Finally, it does not parse -x^y
as -(x^y)
, but as (-x)^y
, which is contrary to frequent practice.a/xy
parses as a/(xy)
, but you would probably object to 2x^7
being parsed as (2x)^7
.The simplest way to avoid those issues is to use a grammar in which operator precedence is uniformly implemented with unambiguous grammar rules.
Here's an example which implements standard precedence rules (exponentiation takes precedence over unary minus; juxtaposing multiply has the same precedence as explicit multiply). It's worth taking a few minutes to look closely at which non-terminal appears in which production, and think about how that correlates with the desired precedence rules.
%union {
double num;
char *var;
ASTNode *node;
}
%token <num> NUM
%token <var> VAR
%type <node> expr mult neg expt atom
%%
prgm: // nothing
| prgm '\n'
| prgm error '\n'
| prgm expr '\n'
expr: mult
| expr '+' mult
| expr '-' mult
mult: neg
| mult '*' neg
| mult '/' neg
| mult expt
neg : expt
| '-' neg
expt: atom
| atom '^' neg
atom: NUM
| VAR
| '(' expr ')'