I have an ANTLR grammar file with the string definition as below
STRING
: '"' (EscapeSequence | ~('\\'|'"') )* '"' ;
fragment EscapeSequence
: '\\' .
;
But this Lexer rule ignore the escape character at the first instance of the quotes. The
id\=\"
is recognized as the start of the string whereas there is a preceding escape character. this is happening only for the first quote. All the subsequent quotes, if escaped, are recognized properly.
/id\=\"Testing\" -- Should not be a string as both quotes are escaped
/id\="Testing" -- Should be a string between the quotes, since they are not escaped
The main problem to solve is to avoid the lexer from trying to recognize a string if the character (only the last one character) preceding a quote is an escape character. If there are multiple escape characters, I need to consider just one character before the starting quote.
ANTLR will automatically provide the behavior you desire in almost every situation. Consider the following input:
/id\=\"Testing\"
The critical requirement involves the location and length of the token preceding the first quote character. In the following block I add spaces only for illustrating conditions that occur between characters.
/ i d \ = \ " T e s t i n g \ "
^
|
----------- Make sure no token can *end* here
By ensuring that the first "
character is included as part of the token which also includes the \
character before it, you ensure that the first "
character will never be interpreted as the start of a STRING
token.
If the above condition is not met, your "
character will be treated as the start of a STRING
token.