parsingpredictive

ambiguous or conflict in LL1 grammar for a shell in C


i'm implementing a LL(1) parser for a project of doing a shell implementation. i'm stuck trying to resolve conflicts in my grammar :

Parsing mode: LL(1).

Grammar:

     1. COMMAND_LINE -> COMPLETE_COMMAND PIPED_CMD
     2. PIPED_CMD -> PIPE COMPLETE_COMMAND PIPED_CMD
     3.            | ε
     4. COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX
     5. CMD_PREFIX -> REDIRECTION CMD_PREFIX
     6.             | ε
     7. CMD_SUFFIX -> REDIRECTION CMD_SUFFIX
     8.             | CMD_ARG CMD_SUFFIX
     9.             | ε
    10. REDIRECTION -> REDIRECTION_OP WORD
    11.              | ε
    12. CMD -> WORD
    13. CMD_ARG -> WORD CMD_ARG
    14.          | SINGLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
    15.          | DOUBLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
    16.          | ε
    17. REDIRECTION_OP -> HERE_DOC
    18.                 | APPEND
    19.                 | INFILE
    20.                 | OUTFILE

i use syntax-cli to check my grammar, and the ll(1) parser is a home made implementation, i can link my implementation of the parser if needed. the conflict detected by syntax-cli are :

PIPE WORD SINGLE_QUOTE DOUBLE_QUOTE HERE_DOC APPEND INFILE OUTFILE $
CMD_SUFFIX 9 7/8 7/8 7/8 7/8 7/8 7/8 7/8 9
REDIRECTION 11 11 11 11 10/11 10/11 10/11 10/11 11
CMD_ARG 16 13/16 14/16 15/16 16 16 16 16 16

i've also tried this grammar :


COMMAND_LINE     
                 : COMPLETE_COMMAND PIPED_CMD
                 ;
PIPED_CMD        
                 : PIPE COMPLETE_COMMAND PIPED_CMD
                 |
                 ;
COMPLETE_COMMAND 
                 : REDIRECTION CMD REDIRECTION CMD_ARG REDIRECTION
                 ;
REDIRECTION      
                 : REDIRECTION_OP WORD
                 | 
                 ;

CMD              
                 : WORD
                 ;
CMD_ARG          
                 : WORD REDIRECTION CMD_ARG
                 | SINGLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
                 | DOUBLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
                 | REDIRECTION
                 ;
REDIRECTION_OP
                 : HERE_DOC
                 | APPEND
                 | INFILE
                 | OUTFILE
                 ;

but the parser don't work when using multiple redirections ...


Solution

  • Without more specification on your behalf, can't be sure to have it all. But indeed, this grammar is ambiguous.

    To build a LL(1) analyzer, you must be able to say, for any combination of symbol on the analyzer stack (symbol being either a terminal or non-terminal yet to read) and any word from the input buffer, what rule should apply.

    Put yourself in the situation where you code starts with a WORD (that is first thing that is in input buffer)

    You start by trying to analyze COMMAND_LINE

    If input buffer starts with WORD, then only one rule can lead to COMMAND_LINE, that is the rule COMPLETE_COMMAND PIPED_CMD (anyway, whatever input, there is only this rule. Either we can apply it, or it is a syntax error. But for now, no reason to raise a syntax error, this rule is compatible with a start with WORD).

    So, now, on your stack you have COMPLETE_COMMAND PIPED_CMD, and in input buffer, still the same WORD.

    The only possible rule for the top of the stack is COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX

    So, now, on your stack you have CMD_PREFIX CMD CMD_SUFFIX PIPED_CMD.

    And waiting in input buffer WORD

    2 rules can be applied from CMD_PREFIX :
    CMD_PREFIX -> REDIRECTION CMD_PREFIX
    or CMD_PREFIX -> ε

    None of them can start with WORD. So either we say that what we have here is an empty CMD_PREFIX (followed by a CMD starting with WORD)

    Or we can see it as a REDIRECTION followed by an empty prefix. REDIRECTION can be REDIRECTION -> ε

    So both are possible at this point. Either we have a CMD_PREFIX(ε) or we have a CMD_PREFIX(REDIRECTION(ε), ε) (or even more recursions).

    For the grammar to be LL(1), we should not have to go deeper to decide. From this point, with the only knowledge that next lexem is WORD, we should be able to choose among those too. We aren't.

    (In fact, even with other grammar than LL(1), we couldn't decide)