regexperl

Match whitespace but not newlines


I sometimes want to match whitespace but not newline.

So far I've been resorting to [ \t]. Is there a less awkward way?


Solution

  • Perl versions 5.10 and later support subsidiary vertical and horizontal character classes, \v and \h, as well as the generic whitespace character class \s

    The cleanest solution is to use the horizontal whitespace character class \h. This will match tab and space from the ASCII set, non-breaking space from extended ASCII, or any of these Unicode characters

    U+0009 CHARACTER TABULATION
    U+0020 SPACE
    U+00A0 NO-BREAK SPACE (not matched by \s)
    
    U+1680 OGHAM SPACE MARK
    U+2000 EN QUAD
    U+2001 EM QUAD
    U+2002 EN SPACE
    U+2003 EM SPACE
    U+2004 THREE-PER-EM SPACE
    U+2005 FOUR-PER-EM SPACE
    U+2006 SIX-PER-EM SPACE
    U+2007 FIGURE SPACE
    U+2008 PUNCTUATION SPACE
    U+2009 THIN SPACE
    U+200A HAIR SPACE
    U+202F NARROW NO-BREAK SPACE
    U+205F MEDIUM MATHEMATICAL SPACE
    U+3000 IDEOGRAPHIC SPACE
    

    The vertical space pattern \v is less useful, but matches these characters

    U+000A LINE FEED
    U+000B LINE TABULATION
    U+000C FORM FEED
    U+000D CARRIAGE RETURN
    U+0085 NEXT LINE (not matched by \s)
    
    U+2028 LINE SEPARATOR
    U+2029 PARAGRAPH SEPARATOR
    

    There are seven vertical whitespace characters which match \v and eighteen horizontal ones which match \h. \s matches twenty-three characters

    All whitespace characters are either vertical or horizontal with no overlap, but they are not proper subsets because \h also matches U+00A0 NO-BREAK SPACE, and \v also matches U+0085 NEXT LINE, neither of which are matched by \s