I am writing a lexical analyzer using JFlex. When the word co
is matched, we have to ignore what comes after until the end of the line (because it's a comment). For the moment, I have a boolean variable that changes to true
whenever this word is matched and if an identifier or an operator is matched after co
until the end of the line, I simply ignore it because I have an if
condition in my Identifier
and Operator
token identification.
I am wondering if there is better way to do this and get rid of this if
statement that appears everywhere?
Here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
%{
private boolean isCommentOpen = false;
private void toggleIsCommentOpen() {
this.isCommentOpen = ! this.isCommentOpen;
}
private boolean getIsCommentOpen() {
return this.isCommentOpen;
}
%}
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%%
{Operators} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
{Identifier} {
if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
// Do Code
}
}
"co" {
toggleIsCommentOpen();
}
. {}
{EndOfLine} {
if (getIsCommentOpen()) {
toggleIsCommentOpen();
}
}
One way to do this is to use states in JFlex. We say that every time the word co
is matched, we enter in a state named COMMENT_STATE
and we do nothing until the end of the line. After the end of the line, we exit the COMMENT_STATE
state. So here is the code:
%% // Options of the scanner
%class Lexer
%unicode
%line
%column
%standalone
Operators = [\+\-]
Identifier = [A-Z]*
EndOfLine = \r|\n|\r\n
%xstate YYINITIAL, COMMENT_STATE
%%
<YYINITIAL> {
"co" {yybegin(COMMENT_STATE);}
}
<COMMENT_STATE> {
{EndOfLine} {yybegin(YYINITIAL);}
. {}
}
{Operators} {// Do Code}
{Identifier} {// Do Code}
. {}
{EndOfLine} {}
With this new approach, the lexer is more simpler and it's also more readable.