Negating inside lexer- and parser rules

How can the negation meta-character, ~, be used in ANTLR's lexer- and parser rules?

Solution

Negating can occur inside lexer and parser rules.

Inside lexer rules you can negate characters, and inside parser rules you can negate tokens (lexer rules). But both lexer- and parser rules can only negate either single characters, or single tokens, respectively.

A couple of examples:

lexer rules

To match one or more characters except lowercase ascii letters, you can do:

NO_LOWERCASE : ~('a'..'z')+ ;

(the negation-meta-char, ~, has a higher precedence than the +, so the rule above equals (~('a'..'z'))+)

Note that 'a'..'z' matches a single character (and can therefor be negated), but the following rule is invalid:

ANY_EXCEPT_AB : ~('ab') ;

Because 'ab' (obviously) matches 2 characters, it cannot be negated. To match a token that consists of 2 character, but not 'ab', you'd have to do the following:

ANY_EXCEPT_AB 
  :  'a' ~'b' // any two chars starting with 'a' followed by any other than 'b'
  |  ~'a' .   // other than 'a' followed by any char
  ;

parser rules

Inside parser rules, ~ negates a certain token, or more than one token. For example, you have the following tokens defined:

A : 'A';
B : 'B';
C : 'C';
D : 'D';
E : 'E';

If you now want to match any token except the A, you do:

p : ~A ;

And if you want to match any token except B and D, you can do:

p : ~(B | D) ;

However, if you want to match any two tokens other than A followed by B, you cannot do:

p : ~(A B) ;

Just as with lexer rules, you cannot negate more than a single token. To accomplish the above, you need to do:

P
  :  A ~B
  |  ~A .
  ;

Note that the . (DOT) char in a parser rules does not match any character as it does inside lexer rules. Inside parser rules, it matches any token (A, B, C, D or E, in this case).

Note that you cannot negate parser rules. The following is illegal:

p : ~a ;
a : A  ;