parsingtokenjavacc

How to match any string between paranthesis that can contain paranthesis?


I am trying to create a JavaCC parser and I am facing an issue.

I want to return everything between parentheses in my text but the string between those parentheses may contain some.

For example, I have this line : Node(new MB34(MB78, MB654) => (MB7, M9)) and I want a string equals to "new MB34(MB78, MB654) => (MB7, M9)". There is no specific pattern between parentheses.

I have tried to use lexical state according to the javacc documentation:

SKIP :
{ " " | "\t" | "\n" | "\r" | "\f" | "\r\n" }

TOKEN :
{
  < #LETTER             : ( [ "a"-"z" ] | [ "A"-"Z" ] ) >
| < #DIGIT              : [ "0"-"9" ] >
| < #ALPHA              : ( < LETTER > | < DIGIT > ) >
| < IDENTIFIER          : < LETTER > ( < ALPHA > )* >
}

TOKEN : {
  < "(" > : IN_LABEL
}

< IN_LABEL > TOKEN : {
  < TEXT_LABEL : ~[] >
}

< IN_LABEL > TOKEN : {
  < END_LABEL : ")"> : DEFAULT
}

String LABEL():
{
  Token token_label;
  String label = "";
}
{
  < IDENTIFIER >
  "(" ( token_label = < TEXT_LABEL > { label += token_label.toString(); } )+ < END_LABEL >
   {
     return label;
   }
}

However, since the string to get out of the lexical state "IN_LABEL" is the single character ")" it doesn't work, the parser matches all the text without returning to the DEFAULT state. I found a temporary solution by replacing the END_LABEL token by :

< IN_LABEL > TOKEN : {
  < END_LABEL : ~[]")"> : DEFAULT
}

But it doesn't work either because this token can match before the real end of the label.

Does anyone have a solution to this problem?


Solution

  • There may be a simpler solution, but here's mine:

    SKIP :
    { " " | "\t" | "\n" | "\r" | "\f" | "\r\n" }
    
    TOKEN :
    {
      < #LETTER             : ( [ "a"-"z" ] | [ "A"-"Z" ] ) >
    | < #DIGIT              : [ "0"-"9" ] >
    | < #ALPHA              : ( < LETTER > | < DIGIT > ) >
    | < IDENTIFIER          : < LETTER > ( < ALPHA > )* >
    }
    
    TOKEN_MGR_DECLS :
    {
        int parLevel;
    }
    
    MORE : {
        "(" : IN_LABEL
    }
    
    < IN_LABEL > TOKEN : {
        < TEXT_LABEL: ")" > {
            matchedToken.image = image.substring(1,image.length()-1);
        } : DEFAULT
    }
    
    < IN_LABEL > MORE : {
       <~["(", ")"]>
    }
    
    < IN_LABEL > MORE : {
        "(" {parLevel = 0;} : IN_LABEL1
    }
    
    < IN_LABEL1 > MORE : {
        "(" {++parLevel;}
    }
    
    < IN_LABEL1 > MORE : {
        ")" {
            if (0 == parLevel--) {
                SwitchTo(IN_LABEL);
            }
        }
    }
    
    < IN_LABEL1 > MORE : {
       <~["(", ")"]>
    }
    
    String LABEL():
    {
      String label = "";
    }
    {
      < IDENTIFIER >
      label = < TEXT_LABEL >.image
       {
         return label;
       }
    }