flex-lexerlexerjflex

check if condition is met before executing the action in JFlex


I am writing a lexical analyzer using JFlex. When the word co is matched, we have to ignore what comes after until the end of the line (because it's a comment). For the moment, I have a boolean variable that changes to true whenever this word is matched and if an identifier or an operator is matched after co until the end of the line, I simply ignore it because I have an if condition in my Identifier and Operator token identification.
I am wondering if there is better way to do this and get rid of this if statement that appears everywhere?

Here is the code:

%% // Options of the scanner

%class Lexer     
%unicode        
%line      
%column      
%standalone 

%{
    private boolean isCommentOpen = false;
    private void toggleIsCommentOpen() {
        this.isCommentOpen = ! this.isCommentOpen;
    }
    private boolean getIsCommentOpen() {
        return this.isCommentOpen;
    }
%} 

Operators           = [\+\-]
Identifier          = [A-Z]*

EndOfLine           = \r|\n|\r\n

%%
{Operators}         {
                        if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
                            // Do Code
                        }
                    }  
 
{Identifier}        {
                        if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
                            // Do Code
                        }
                    }

"co"                {
                        toggleIsCommentOpen();
                    }

.                   {}

{EndOfLine}         {
                        if (getIsCommentOpen()) {
                            toggleIsCommentOpen();
                        }
                    }

Solution

  • One way to do this is to use states in JFlex. We say that every time the word co is matched, we enter in a state named COMMENT_STATE and we do nothing until the end of the line. After the end of the line, we exit the COMMENT_STATE state. So here is the code:

    %% // Options of the scanner
    
    %class Lexer     
    %unicode        
    %line      
    %column      
    %standalone  
    
    Operators           = [\+\-]
    Identifier          = [A-Z]*
    
    EndOfLine           = \r|\n|\r\n
    
    %xstate YYINITIAL, COMMENT_STATE
    
    %%
    <YYINITIAL> {
        "co" {yybegin(COMMENT_STATE);}
    }
    
    <COMMENT_STATE> {
        {EndOfLine} {yybegin(YYINITIAL);}
        .           {}
    }
    
    {Operators} {// Do Code}  
     
    {Identifier} {// Do Code} 
    
    . {}
    
    {EndOfLine} {}
    

    With this new approach, the lexer is more simpler and it's also more readable.