flexboxbisonyacclex

How to make two tokens in lexer


I'm currently making a RUBY language parser. My question is that I want to use two tokens for one character. For example, in the lexer I write

"(" { return tLPAREN; return tLPAREN2; } while the token tLPAREN2 DON,T WORK.

How to make two tokens can handle 1 character. It would help me get rid of conflicts in grammar.


Flex version 2.6.3 source code and win_bison based on Bison version 2.7

"(" { return tLPAREN; return tLPAREN2; } while the token tLPAREN2 DON,T WORK.


Solution

  • You can do something like this with start states. You'll recognize the token and have the action set an (exclusive) start state, back up the input and return the first token. Then in the start state, you'll recognize the token again and go back to the normal start state and return the second token. So something like:

    %x Paren2
    
    %%
    
    "("             { BEGIN(Paren2); yyless(0); return tLPAREN; }
    <Paren2>"("     { BEGIN(INITIAL); return tLPAREN2; }
    

    Note the call to yyless(0) which "pushes back" the ( to be recognized again by the second rule.