'[^']*\'
I use this rule to make lex match strings, it works fine when the string length is less than 9000, so how do I get lex to match strings longer than 9,000
Whether I should change the rules? Or do I have to set something up? I wish someone could help me
You can change states from the predefined INITIAL
state to some other state, SQSTR
, when you encounter '
. Within the SQSTR
state, you switch back to INITIAL
when you encounter an unescaped '
. Otherwise, you stay in SQSTR
and append characters to the token. How you optimally manage errors and string growth wrt memory allocation is an exercise left to the reader. Multi-line strings are also straightforward. And, of course, you should recognize an obvious refactoring opportunity which should be glaring red if you try to add multi-line string support.
%s SQSTR
%%
%{
char *str;
int len;
%}
<INITIAL>' {
str = malloc(1);
len = 0;
*str = 0;
BEGIN(SQSTR);
}
<SQSTR>\\' {
str = realloc(str, len+1);
str[len] = '\'';
str[len+1] = 0;
len++;
}
<SQSTR>' {
printf("length of str is %d. First 10 is '%.10s' and last 10 are '%s'", len, str, len>=10 ? str+len-10 : str);
BEGIN(INITIAL);
}
<SQSTR>. {
str = realloc(str, len+1);
str[len] = *yytext;
str[len+1] = 0;
len++;
}
int yywrap () {
return 1;
}
int main (int argc, char *argv[]) {
yylex();
}
$ wc bigger
1 5 16337 bigger
$ flex t.l && gcc -g lex.yy.c && ./a.out < bigger
length of str is 16334. First 10 is 'aaaaaaaaaa' and last 10 are 'aaaaaaaaaa'
Edit #1 In the original post, I mistakenly placed the more general .
rule before the '
rule. Silly me.
Edit #2 Add main and debugging