I'm trying to write a parser for a javascript-ish language with JFlex and Cup, but I'm having some issues with those deadly shift/reduce problems and reduce/reduce problems.
I have searched thoroughly and have found tons of examples, but I'm not able to extrapolate these to my grammar. My understanding so far is that these problems are because the parser cannot decide which way it should take because it can't distinguish.
My grammar is the following one: start with INPUT;
INPUT::= PROGRAM;
PROGRAM::= FUNCTION NEWLINE PROGRAM
| NEWLINE PROGRAM;
FUNCTION ::= function OPTIONAL id p_izq ARG p_der NEWLINE l_izq NEWLINE BODY l_der;
OPTIONAL ::=
| TYPE;
TYPE::= integer
| boolean
ARG ::=
| TYPE id MORE_ARGS;
MORE_ARGS ::=
| colon TYPE id MORE_ARGS;
NEWLINE ::= salto NEWLINE
| ;
BODY ::= ;
I'm getting several conflicts but these 2 are a mere example:
Warning : *** Shift/Reduce conflict found in state #5
between NEWLINE ::= (*)
and NEWLINE ::= (*) salto NEWLINE
under symbol salto
Resolved in favor of shifting.
Warning : *** Shift/Reduce conflict found in state #0
between NEWLINE ::= (*)
and FUNCTION ::= (*) function OPTIONAL id p_izq ARG p_der NEWLINE l_izq NEWLINE BODY l_der
under symbol function
Resolved in favor of shifting.
PS: The grammar is far more complex, but I think that if I see how these shift/reduce problems are solved, I'll be able to fix the rest.
Thanks for your answers.
PROGRAM
is useless (in the technical sense). That is, it cannot produce any sentence because in
PROGRAM::= FUNCTION NEWLINE PROGRAM
| NEWLINE PROGRAM;
both productions for PROGRAM
are recursive. For a non-terminal to be useful, it needs to be able to eventually produce some string of terminals, and for it to do that, it must have at least one non-recursive production; otherwise, the recursion can never terminate. I'm surprised CUP didn't mention this to you. Or maybe it did, and you chose to ignore the warning.
That's a problem -- useless non-terminals really cannot ever match anything, so they will eventually result in a parse error -- but it's not the parsing conflict you're reporting. The conflicts come from another feature of the same production, which is related to the fact that you can't divide by 0.
The thing about nothing is that any number of nothings are still nothing. So if you had plenty of nothings and someone came along and asked you exactly how many nothings you had, you'd have a bit of problem, because to get "plenty" back from "0 * plenty", you'd have to compute "0 / 0", and that's not a well-defined value. (If you had plenty of two's, and someone asked you how many two's you had, that wouldn't be a problem: suppose the plenty of two's worked out to 40; you could easily compute that 40 / 2 = 20, which works out perfectly because 20 * 2 = 40.)
So here we don't have arithmetic, we have strings of symbols. And unfortunately the string containing no symbols is really invisible, like 0 was for all those millenia until some Arabian mathematician noticed the value of being able to write nothing.
Where this is all coming around to is that you have
PROGRAM::= ... something ...
| NEWLINE PROGRAM;
But NEWLINE
is allowed to produce nothing.
NEWLINE ::= salto NEWLINE
| ;
So the second recursive production of PROGRAM
might add nothing. And it might add nothing plenty of times, because the production is recursive. But the parser needs to be deterministic: it needs to know exactly how many nothings are present, so that it can reduce each nothing to a NEWLINE
non-terminal and then reduce a new PROGRAM
non-terminal. And it really doesn't know how many nothings to add.
In short, optional nothings and repeated nothings are ambiguous. If you are going to insert a nothing into your language, you need to make sure that there are a fixed finite number of nothings, because that's the only way the parser can unambiguously dissect nothingness.
Now, since the only point of that particular recursion (as far as I can see) is to allow empty newline-terminated statements (which are visible because of the newline, but do nothing). And you could do that by changing the recursion to avoid nothing:
PROGRAM ::= ...
| salto PROGRAM;
Although it's not relevant to your current problem, I feel obliged to mention that CUP is an LALR parser generator and all that stuff you might have learned or read on the internet about recursive descent parsers not being able to handle left recursion does not apply. (I deleted a rant about the way parsing technique are taught, so you'll have to try to recover it from the hints I've left behind.) Bottom-up parser generators like CUP and yacc/bison love left recursion. They can handle right recursion, too, of course, but reluctantly because they need to waste stack space for every recursion other than left recursion. So there is no need to warp your grammar in order to deal with the deficiency; just write the grammar naturally and be happy. (So you rarely if ever need non-terminals representing "the rest of" something.)
PD: Plenty of nothing is a culturally-specific reference to a song from the 1934 opera Porgy and Bess.