parsinggrammarcontext-free-grammarjavacc

Different ways to declare LOOKAHEAD in JAVACC


I have been understanding Javacc grammar to write the parser where i found a line which tells like

Options : { LOOKAHEAD=3; }

I wondered what was the LOOKAHEAD and is there any different ways to declare the lookahead ?


Solution

  • The LOOKAHEAD options is the default number of tokens that will be used to make a decision on which path will be chosen at each choice point.

    This configuration is required to resolve choice conflicts as javacc does not support backtracking.

    The default value for the LOOKAHEAD option is 1. So it is important to write the grammar that suits for LOOKAHEAD=1, meaning choices will be resolved by looking one token ahead.

    For Example :

    As the JavaCC is a LL parser, if the left token of a production rule is repeated, a choice conflict will arise:

    void PhoneNumber() : {} {​
    
    (LocalNumber() | CountryNumber()) <EOF> ​
    
    }​
    
    void LocalNumber() : {} { ​
    
    AreaCode() "-" <FOUR_DIGITS>  ​
    
    }​
    
    void CountryNumber() : {} {  ​
    
    AreaCode() "-" <THREE_DIGIT> "-" <FOUR_DIGITS>   ​
    
    }​
    
    void AreaCode() : {} {   ​
    
     <THREE_DIGIT> ​
    
    }
    

    Note that both CountryNumber and LocalNumber start with terminal <THREE_DIGIT> which causes the choice conflict,

    This can be rewritten by an approach called "left​ factoring​."

    void PhoneNumber() : {} {​
    
    AreaCode() "-" (LocalNumber() | CountryNumber()) <EOF> ​
    
    }​
    
    void LocalNumber() : {} { ​
    
    <FOUR_DIGITS>  ​
    
    }​
    
    void CountryNumber() : {} {  ​
    
    <THREE_DIGIT> "-" <FOUR_DIGITS>   ​
    
    }​
    
    void AreaCode() : {} {   ​
    
     <THREE_DIGIT> ​
    
    }
    

    For the case where this cannot be done, a Lookahead specification is used

    Lookahead specifications are of five types as,

    Multiple token LOOKAHEAD

    This can be used with the integer passed in the LOOKAHEAD(k) method. It can be done by,​

    Syntactic LOOKAHEAD

    The Syntactic lookahead ​specification uses a syntactic construct as a choice resolver.

    When specifying the syntactic​ lookahead without multiple token, what you're really​ specifying is syntactic lookahead with infinite multiple tokens.​

    Eg: LOOKAHEAD(2147483647,​ LocalNumber())​

    void PhoneNumber() : {} {​
    
    (​
    LOOKAHEAD(("A"|"B")+ AreaCode() "-" <FOUR_DIGITS>) LocalNumber() ​
    
    | CountryNumber()​
    
    ) <EOF> ​
    
    }
    

    Combination of Multiple token and Syntactic LOOKAHEAD

    With this we could Lookahead ​a limited number of tokens as long as the syntactic lookahead satisfied within the specified choice will be made.

    void PhoneNumber() : {} {​
    
    (​
    LOOKAHEAD(10, ("A"|"B")+ AreaCode() "-" <FOUR_DIGITS>) LocalNumber() ​
    
    | CountryNumber()​
    
    ) <EOF> ​
    
    }
    

    Semantic LOOKAHEAD

    Semantic lookahead involves embedding a Java boolean expression in the grammar at the choice point.​

    If boolean expression evaluates to true​, the current expansion is chosen.

    void PhoneNumber() : {} {​
    
    (​
    LOOKAHEAD({getToken(1).image.equals("123")}) ​
    KeysvilleNumber() ​
    
    | FarmvilleNumber()​
    
    ) <EOF> ​
    
    }
    

    Nested Lookahead

    Nested lookahead occurs when​ one lookahead directive ​overlaps another.​ In JavaCC nested syntactic lookahead​ is ignored, but not the nested​ semantic lookahead.​

    void Start() : {} {​
    
    (​
    LOOKAHEAD(Fullname()) Fullname() ​
    
    | Douglas()​
    
    ) <EOF> ​
    
    }​
    
    void Fullname() : {} {​
    
    ( LOOKAHEAD(Douglas() Munro()) ​
    
     Douglas()​
    
    | Douglas() Albert()​
    
    )​
    
    Munro() ​
    
    }​
    
    void Douglas() : {} { "Douglas" } ​
    
    void Albert() : {} { "Albert" }}​
    
    void Munro() : {} { "Munro" }}
    

    All the examples are taken from the Awesome book Generating Parsers with JavaCC by Tom Copeland

    Hope this is useful, Thanks.