i'm implementing a LL(1) parser for a project of doing a shell implementation. i'm stuck trying to resolve conflicts in my grammar :
Parsing mode: LL(1).
Grammar:
1. COMMAND_LINE -> COMPLETE_COMMAND PIPED_CMD
2. PIPED_CMD -> PIPE COMPLETE_COMMAND PIPED_CMD
3. | ε
4. COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX
5. CMD_PREFIX -> REDIRECTION CMD_PREFIX
6. | ε
7. CMD_SUFFIX -> REDIRECTION CMD_SUFFIX
8. | CMD_ARG CMD_SUFFIX
9. | ε
10. REDIRECTION -> REDIRECTION_OP WORD
11. | ε
12. CMD -> WORD
13. CMD_ARG -> WORD CMD_ARG
14. | SINGLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
15. | DOUBLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
16. | ε
17. REDIRECTION_OP -> HERE_DOC
18. | APPEND
19. | INFILE
20. | OUTFILE
i use syntax-cli to check my grammar, and the ll(1) parser is a home made implementation, i can link my implementation of the parser if needed. the conflict detected by syntax-cli are :
PIPE | WORD | SINGLE_QUOTE | DOUBLE_QUOTE | HERE_DOC | APPEND | INFILE | OUTFILE | $ | |
---|---|---|---|---|---|---|---|---|---|
CMD_SUFFIX | 9 | 7/8 | 7/8 | 7/8 | 7/8 | 7/8 | 7/8 | 7/8 | 9 |
REDIRECTION | 11 | 11 | 11 | 11 | 10/11 | 10/11 | 10/11 | 10/11 | 11 |
CMD_ARG | 16 | 13/16 | 14/16 | 15/16 | 16 | 16 | 16 | 16 | 16 |
i've also tried this grammar :
COMMAND_LINE
: COMPLETE_COMMAND PIPED_CMD
;
PIPED_CMD
: PIPE COMPLETE_COMMAND PIPED_CMD
|
;
COMPLETE_COMMAND
: REDIRECTION CMD REDIRECTION CMD_ARG REDIRECTION
;
REDIRECTION
: REDIRECTION_OP WORD
|
;
CMD
: WORD
;
CMD_ARG
: WORD REDIRECTION CMD_ARG
| SINGLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
| DOUBLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
| REDIRECTION
;
REDIRECTION_OP
: HERE_DOC
| APPEND
| INFILE
| OUTFILE
;
but the parser don't work when using multiple redirections ...
Without more specification on your behalf, can't be sure to have it all. But indeed, this grammar is ambiguous.
To build a LL(1) analyzer, you must be able to say, for any combination of symbol on the analyzer stack (symbol being either a terminal or non-terminal yet to read) and any word from the input buffer, what rule should apply.
Put yourself in the situation where you code starts with a WORD
(that is first thing that is in input buffer)
You start by trying to analyze COMMAND_LINE
If input buffer starts with WORD
, then only one rule can lead to COMMAND_LINE
, that is the rule COMPLETE_COMMAND PIPED_CMD
(anyway, whatever input, there is only this rule. Either we can apply it, or it is a syntax error. But for now, no reason to raise a syntax error, this rule is compatible with a start with WORD
).
So, now, on your stack you have COMPLETE_COMMAND PIPED_CMD
, and in input buffer, still the same WORD
.
The only possible rule for the top of the stack is COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX
So, now, on your stack you have CMD_PREFIX CMD CMD_SUFFIX PIPED_CMD
.
And waiting in input buffer WORD
2 rules can be applied from CMD_PREFIX
:
CMD_PREFIX -> REDIRECTION CMD_PREFIX
or CMD_PREFIX -> ε
None of them can start with WORD
. So either we say that what we have here is an empty CMD_PREFIX
(followed by a CMD
starting with WORD
)
Or we can see it as a REDIRECTION
followed by an empty prefix. REDIRECTION
can be REDIRECTION -> ε
So both are possible at this point. Either we have a CMD_PREFIX(ε)
or we have a CMD_PREFIX(REDIRECTION(ε), ε)
(or even more recursions).
For the grammar to be LL(1), we should not have to go deeper to decide. From this point, with the only knowledge that next lexem is WORD
, we should be able to choose among those too. We aren't.
(In fact, even with other grammar than LL(1), we couldn't decide)