regexperl

perl \s metacharacter not matching when empty lines exists


I have 2 similar files which contains the following

hello


  peppa

The only difference is that ./c/d/b.txt has blank line which contains space while ./c/d/m.txt has empty line in line 3

Running the find . -name "*.txt" -type f -exec perl -00 -ne 'print "$ARGV\n" if ($_ =~ /hello\s+peppa/msi);' {} \; will print ./c/d/b.txt but not ./c/d/m.txt. I expecting it also print ./c/d/m.txt

Below is the hexdump of the file if it help

[user@host]$ hexdump -v -C ./c/d/m.txt
00000000  68 65 6c 6c 6f 0a 20 20  20 20 0a 0a 20 20 20 20  |hello.    ..    |
00000010  70 65 70 70 61 0a                                 |peppa.|
00000016

[user@host]$ hexdump -v -C ./c/d/b.txt
00000000  68 65 6c 6c 6f 0a 20 20  20 20 0a 20 0a 20 20 20  |hello.    . .   |
00000010  20 70 65 70 70 61 0a                              | peppa.|
00000017


I was able to verify that this occurs for both perl 5.16 and 5.38


Solution

  • -00 is special, it doesn't mean "separated by a null byte". From perldoc perlrun:

    The special value 00 will cause Perl to slurp files in paragraph mode.

    Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose. The "-g" flag is a simpler alias for it.

    In paragraph mode, Perl doesn't read the whole file, as the empty line separates two paragraphs.

    By the way, to use the null byte, use just -0 with no digits.

    So your solution is to use -0777 instead of -00.