regexperl

Match pattern `\S` except when part of another pattern (ANSI escape)


I want to know whether the input contains any non-space as in \S. However, the input may contain ANSI VT escape sequences (Wikipedia) for text color and style, which do match \S (even the ESC code matches \S), and while they have an effect on the output (in the console, or terminal), they do not count as non-space for my purpose (which is to find out whether there is actual text other than whitespace).

I put together a good-enough RE for the ANSI VT escape sequences (\x1b\[[0-9;]{,9}m), but in the general case, and for Q&A purposes, it could be any “easier” dummy sequence such as A[a-z]+m that I would want to rule out as a match.

So how to proceed? What I found easiest is to first cleanse the input (note the /r modifier, which returns a cleansed copy and leaves the input unharmed):

$cleansed = $input =~ s/A[a-z]+m//gr;          # easy dummy

… or alternatively (for the real ANSI VT thing):

$cleansed = $input =~ s/\x1b\[[0-9;]{,9}m//gr; # ANSI VT

… and then match the cleansed copy against \S. And this works.

So this is a two-pass approach, and it is easily understood (not the least important aspect). But I'm wondering whether there is a smarter and yet still clear one-pass approach (that'll inevitably take my RE insight to the next level) ?


Solution

  • Instead of searching for what you want to find (\S except as escape sequence), check if the string is entirely made up of what you don't want to find (\s and escape sequences).

    !/^(?:\x1b\[[0-9;]{,9}m|\s)*\z/