I sometimes want to match whitespace but not newline.
So far I've been resorting to [ \t]
. Is there a less awkward way?
Perl versions 5.10 and later support subsidiary vertical and horizontal character classes, \v
and \h
, as well as the generic whitespace character class \s
The cleanest solution is to use the horizontal whitespace character class \h
. This will match tab and space from the ASCII set, non-breaking space from extended ASCII, or any of these Unicode characters
U+0009 CHARACTER TABULATION
U+0020 SPACE
U+00A0 NO-BREAK SPACE (not matched by \s)
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
The vertical space pattern \v
is less useful, but matches these characters
U+000A LINE FEED
U+000B LINE TABULATION
U+000C FORM FEED
U+000D CARRIAGE RETURN
U+0085 NEXT LINE (not matched by \s)
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
There are seven vertical whitespace characters which match \v
and eighteen horizontal ones which match \h
. \s
matches twenty-three characters
All whitespace characters are either vertical or horizontal with no overlap, but they are not proper subsets because \h
also matches U+00A0 NO-BREAK SPACE, and \v
also matches U+0085 NEXT LINE, neither of which are matched by \s