I have been understanding Javacc grammar to write the parser where i found a line which tells like
Options : { LOOKAHEAD=3; }
I wondered what was the LOOKAHEAD and is there any different ways to declare the lookahead ?
The LOOKAHEAD options is the default number of tokens that will be used to make a decision on which path will be chosen at each choice point.
This configuration is required to resolve choice conflicts as javacc does not support backtracking.
The default value for the LOOKAHEAD
option is 1. So it is important to write the grammar that suits for LOOKAHEAD=1, meaning choices will be resolved by looking one token ahead.
For Example :
As the JavaCC is a LL parser, if the left token of a production rule is repeated, a choice conflict will arise:
void PhoneNumber() : {} {
(LocalNumber() | CountryNumber()) <EOF>
}
void LocalNumber() : {} {
AreaCode() "-" <FOUR_DIGITS>
}
void CountryNumber() : {} {
AreaCode() "-" <THREE_DIGIT> "-" <FOUR_DIGITS>
}
void AreaCode() : {} {
<THREE_DIGIT>
}
Note that both CountryNumber
and LocalNumber
start with terminal <THREE_DIGIT>
which causes the choice conflict,
This can be rewritten by an approach called "left factoring."
void PhoneNumber() : {} {
AreaCode() "-" (LocalNumber() | CountryNumber()) <EOF>
}
void LocalNumber() : {} {
<FOUR_DIGITS>
}
void CountryNumber() : {} {
<THREE_DIGIT> "-" <FOUR_DIGITS>
}
void AreaCode() : {} {
<THREE_DIGIT>
}
For the case where this cannot be done, a Lookahead specification is used
Lookahead specifications are of five types as,
Multiple token LOOKAHEAD
This can be used with the integer passed in the LOOKAHEAD(k) method. It can be done by,
Local LOOKAHEAD
void PhoneNumber() : {} { ( LOOKAHEAD(3) LocalNumber() | CountryNumber() ) }
Global LOOKAHEAD
Options : { LOOKAHEAD=3; }
Syntactic LOOKAHEAD
The Syntactic lookahead specification uses a syntactic construct as a choice resolver.
When specifying the syntactic lookahead without multiple token, what you're really specifying is syntactic lookahead with infinite multiple tokens.
Eg: LOOKAHEAD(2147483647, LocalNumber())
void PhoneNumber() : {} {
(
LOOKAHEAD(("A"|"B")+ AreaCode() "-" <FOUR_DIGITS>) LocalNumber()
| CountryNumber()
) <EOF>
}
Combination of Multiple token and Syntactic LOOKAHEAD
With this we could Lookahead a limited number of tokens as long as the syntactic lookahead satisfied within the specified choice will be made.
void PhoneNumber() : {} {
(
LOOKAHEAD(10, ("A"|"B")+ AreaCode() "-" <FOUR_DIGITS>) LocalNumber()
| CountryNumber()
) <EOF>
}
Semantic LOOKAHEAD
Semantic lookahead involves embedding a Java boolean expression in the grammar at the choice point.
If boolean expression evaluates to true, the current expansion is chosen.
void PhoneNumber() : {} {
(
LOOKAHEAD({getToken(1).image.equals("123")})
KeysvilleNumber()
| FarmvilleNumber()
) <EOF>
}
Nested Lookahead
Nested lookahead occurs when one lookahead directive overlaps another. In JavaCC nested syntactic lookahead is ignored, but not the nested semantic lookahead.
void Start() : {} {
(
LOOKAHEAD(Fullname()) Fullname()
| Douglas()
) <EOF>
}
void Fullname() : {} {
( LOOKAHEAD(Douglas() Munro())
Douglas()
| Douglas() Albert()
)
Munro()
}
void Douglas() : {} { "Douglas" }
void Albert() : {} { "Albert" }}
void Munro() : {} { "Munro" }}
All the examples are taken from the Awesome book Generating Parsers with JavaCC by Tom Copeland
Hope this is useful, Thanks.