The current commands I'm using to search some hex values (say 0A 8b 02
) involve:
find . -type f -not -name "*.png" -exec xxd -p {} \; | grep "0a8b02" || xargs -0 -P 4
Is it possible to improve this given the following goals:
.png
files)I'm not too confident if the xargs
is working properly for 4 processors. Also I'm having difficulties printing the filename when grep
finds a match since it is piped from xxd
. Any suggestions?
IF:
grep
0xa
)[1]
0x
), you must provide the grep
search string via a file (-f
) rather than by direct argument.the following command would get you there, using the example of searching for 0e 8b 02
:
LC_ALL=C find . -type f -not -name "*.png" -exec grep -FHoab $'\x{0e}\x{8b}\x{02}' {} + |
LC_ALL=C cut -d: -f1-2
The grep
command produces output lines as follows:
<filename>:<byte-offset>:<matched-bytes>
which LC_ALL=C cut -d: -f1-2
then reduces to <filename>:<byte-offset>
The command almost works with BSD grep
, except that the byte offset reported is invariably the start of the line that the pattern was matched on.
In other words: the byte offset will only be correct if no newlines precede a match in the file.
Also, BSD grep
doesn't support specifying NUL (0x0
) bytes as part of the search string, not even when provided via a file with -f
.
grep
invocations, based on using find
's -exec ... +
, which, like xargs
, passes as many filenames as will fit on a command line to grep
at once.grep
search for the byte sequence directly, there is no need for xxd
:
-F
), which is faster.bash
manual, but they work in zsh
(and ksh
) too.
-P
(support for PRCEs, Perl-compatible regular expressions) with non-pre-expanded escape sequences, but this will be slower: grep -PHoab '\x{0e}\x{8b}\x{02}'
LC_ALL=C
ensures that grep
treats each byte as its own character without applying any encoding rules.-F
treats the search strings as a literal (rather than a regex)-H
prepends the relevant input filename to each output line; note that Grep does this implicitly when given more than 1 filename argument-o
only report matched strings (byte sequences), not the whole line (the concept of a line has no meaning in binary files anyway)[2]-a
treats binary files as if they were text files (without this, Grep would only print text Binary file <filename> matches
for binary input files with matches)-b
reports the byte offsets of matchesIf it's sufficient to find at most 1 match in a given input file, add -m 1
.
[1] Newlines cannot be used, because Grep invariably treats newlines in a search-pattern string as separating multiple search patterns. Also, Grep is line-based, so you can't match across lines; GNU Grep's -null-data
option to split the input by NUL bytes could help, but only if your search byte sequence doesn't also comprise NUL bytes; you'd also have to represent your byte values as escape sequences in a regex combined with -P
- because you'll need to use escape sequence \n
in lieu of actual newlines.
[2] -o
is needed to make -b
report the byte offset of the match as opposed to that of the beginning of the line (as stated, BSD Grep always does the latter, unfortunately); additionally, it is beneficial to only report the matches themselves here, as an attempt to print the entire line would result in unpredictably long output lines, given that there's no concept of lines in binary files; either way, however, outputting bytes from a binary file may cause strange rendering behavior in the terminal.