regexperllatexregexp-grammars

What does a character class with only a lone caret do?


In trying to answer the question Writing text into new line when a particular character is found, I have employed Regexp::Grammars. It has long interested me and finally I had reason to learn. I noticed that the description section the author has a LaTeX parser (I am an avid LaTeX user, so this interested me) but it has one odd construct seen here:

    <rule: Option>     [^][\$&%#_{}~^\s,]+

    <rule: Literal>    [^][\$&%#_{}~^\s]+

What do the [^] character classes accomplish?


Solution

  • [^][…] is not two character classes but just one character class containing any other character except ], [, and (see Special Characters Inside a Bracketed Character Class):

    However, if the ] is the first (or the second if the first character is a caret) character of a bracketed character class, it does not denote the end of the class (as you cannot have an empty class) and is considered part of the set of characters that can be matched without escaping.

    Examples:

    "+"   =~ /[+?*]/     #  Match, "+" in a character class is not special.
    "\cH" =~ /[\b]/      #  Match, \b inside in a character class
                         #  is equivalent to a backspace.
    "]"   =~ /[][]/      #  Match, as the character class contains.
                         #  both [ and ].
    "[]"  =~ /[[]]/      #  Match, the pattern contains a character class
                         #  containing just ], and the character class is
                         #  followed by a ].