regexwhitespace

Why is an underscore (_) not regarded as a non-word character?


Why is an underscore _ not regarded as a non-word character? This regexp \W matches all non-word character but not the underscore.


Solution

  • Referring to Jeffrey Friedl's book about Regular Expressions, this was a change in Perl Regular Expressions, originally. Back to 1988 according to characters that were allowed to name a Perl variable [Page 89]:

    Perl 2 was released in June 1988. Larry had replaced the regex code entirely, this time using a greatly enhanced version of the Henry Spencer package mentioned in the previous section. You could still have at most nine sets of parentheses, but now you could use | inside them. Support for \d and \s was added, and support for \w was changed to include an underscore, since then it would match what characters were allowed in a Perl variable name.