When I include the NULL character (\x00) in a regex character range in BSD grep, the result is unexpected: no characters match. Why is this happening?
Here is an example:
$ echo 'ABCabc<>/ă' | grep -o [$'\x00'-$'\x7f']
Here I expect all characters up until the last one to match, however the result is no output (no matches).
Alternatively, when I start the character range from \x01, it works as expected:
$ echo 'ABCabc<>/ă' | grep -o [$'\x01'-$'\x7f']
A
B
C
a
b
c
<
>
/
Also, here are my grep and BASH versions:
$ grep --version
grep (BSD grep) 2.5.1-FreeBSD
$ echo $BASH_VERSION
3.2.57(1)-release
Noting that $'...'
is a shell quoting construct, this,
$ echo 'ABCabc<>/ă' | grep -o [$'\x00'-$'\x7f']
would try to pass a literal NUL character as part of the command line argument to grep
. That's impossible to do in any Unix-like system, as the command line arguments are passed to the process as NUL-terminated strings. So in effect, grep
sees just the arguments -o
and [
.
You would need to create some pattern that matches the NUL byte without including it literally. But I don't think grep
supports the \000
or \x00
escapes itself. Perl does, though, so this prints the input line with the NUL:
$ printf 'foo\nbar\0\n' |perl -ne 'print if /\000/'
bar
As an aside, at least GNU grep doesn't seem to like that kind of a range expression, so if you were to use that, you'd to do something different. In the C
locale, [[:cntrl:][:print:]]'
might perhaps work to match the characters from \x01
to \x7f
, but I didn't check comprehensively.
The manual for grep has some descriptions of the classes.
Note also that [$'\x00'-$'\x7f']
has an unquoted pair of [
and ]
and so is a shell glob. This isn't related to the NUL byte, but if you had files that match the glob (any one-letter names, if the glob works on your system -- it doesn't on my Linux), or had failglob
or nullglob
set, it would probably give results you didn't want. Instead, quote the brackets too: $'[\x00-\x7f]'
.