lexfall-through

Command fall through in lex


I'm using lex in my program and I've run into a problem I need some help with.

My program accepts its input in the form of [something " something]. This is working correctly.

However, I also need to accept the form [something"something].

Is there a way that I can have some sort of first case in lex that all input is run through (like preprocessing), and then have that same, modified input continue on through the rest of my program?

Here's kind of what I'm talking about:

%%
.* {
   do preprocessing
   }

something{
   return SOMETHING;
   }

\" {
   return QUOTE;
   }
%%

Solution

  • Well, you could actually write a preprocessor in lex and put it into your build system, but thats probably overkill!

    You can use start conditions, switching between them with BEGIN, and parse input first, then use unput to push characters back into the stream, then a different start condition can parse the result (See the Flex manual).

    I recently wrote a parser for a python-like config language that did just that. the parser had two modes (start conditions), one to count tabs at the start of a line to determine scope, and then another to do the actual parsing.

    These methods are fine but there is usually a better way of doing it, especially if your input scheme isn't hugely complex.

    Is there a gramatical difference between [something " something] and [something"something] for your program? would a whitespace eating rule do the trick?

    Could describe your language and grammar a little more....?

    After Comment:

    Ok, so basically you have two tokens, SOMETHING and QUOTE. If your tokens are seperated by white space you can do the following:

    %%
    \"     {
           //this will match a single quote
           return QUOTE;
           }
    
    [^" \t\n\r]+   {
                   //this will match a run of anything thats not a quote, space, tab or line ending
                   return SOMETHING;
                   }
    
    [ \t\n\r]      {
                   //do nothing: i.e. ignore whitespace
                   }
    
    %%
    

    For your SOMETHING token you could also match something like [A-Za-z_][A-Za-z0-9_]* which will match a letter or an underscore followed by 0 or more letters, underscores and numbers.

    Does that help?