Given a positional language like the old IBM RPG, we can have a line such as
CCCCCDIDENTIFIER E S 10
Where characters
1-5: comment
6: specification type
7-21: identifier name
...And so on
Now, given that JFlex is based on RegExp, we would have a RegExp such as:
[a-zA-Z][a-zA-Z0-9]{0,14} {0,14}
for the identifier name
token.
This RegExp however can match tokens longer than the 15 characters possible for identifier name
, requiring yypushback
s.
Thus, is there a way to limit how many characters JFlex reads for a particular token?
Regular expression based lexical analysis is really not the right tool to parse fixed-field inputs. You can just split the input into fields at the known character positions, which is way easier and a lot faster. And it doesn't require fussing with regular expressions.
Anyway, [a-zA-Z][a-zA-Z0-9]{0,14}[ ]{0,14}
wouldn't be the right expression even if it did properly handle the token length, since the token is really the word at the beginning, without space characters.
In the case of fixed-length fields which contain something more complicated than a single identifier, you might want to feed the field into a lexer, using a StringReader or some such.
Although I'm sure it's not useful, here's a regular expression which matches 15 characters which start with a word and are completed with spaces:
[a-zA-Z][ ]{14} |
[a-zA-Z][a-zA-Z0-9][ ]{13} |
[a-zA-Z][a-zA-Z0-9]{2}[ ]{12} |
[a-zA-Z][a-zA-Z0-9]{3}[ ]{11} |
[a-zA-Z][a-zA-Z0-9]{4}[ ]{10} |
[a-zA-Z][a-zA-Z0-9]{5}[ ]{9} |
[a-zA-Z][a-zA-Z0-9]{6}[ ]{8} |
[a-zA-Z][a-zA-Z0-9]{7}[ ]{7} |
[a-zA-Z][a-zA-Z0-9]{8}[ ]{6} |
[a-zA-Z][a-zA-Z0-9]{9}[ ]{5} |
[a-zA-Z][a-zA-Z0-9]{10}[ ]{4} |
[a-zA-Z][a-zA-Z0-9]{11}[ ]{3} |
[a-zA-Z][a-zA-Z0-9]{12}[ ]{2} |
[a-zA-Z][a-zA-Z0-9]{13}[ ] |
[a-zA-Z][a-zA-Z0-9]{14}
(That might have to be put on one very long line.)