grep
can't be fed "raw" strings when used from the command-line, since some characters need to be escaped to not be treated as literals. For example:
$ grep '(hello|bye)' # WON'T MATCH 'hello'
$ grep '\(hello\|bye\)' # GOOD, BUT QUICKLY BECOMES UNREADABLE
I was using printf
to auto-escape strings:
$ printf '%q' '(some|group)\n'
\(some\|group\)\\n
This produces a bash-escaped version of the string, and using backticks, this can easily be passed to a grep call:
$ grep `printf '%q' '(a|b|c)'`
However, it's clearly not meant for this: some characters in the output are not escaped, and some are unnecessarily so. For example:
$ printf '%q' '(^#)'
\(\^#\)
The ^
character should not be escaped when passed to grep
.
Is there a cli tool that takes a raw string and returns a bash-escaped version of the string that can be directly used as pattern with grep? How can I achieve this in pure bash, if not?
If you are attempting to get grep
to use Extended Regular Expression syntax, the way to do that is to use grep -E
(aka egrep
). You should also know about grep -F
(aka fgrep
) and, in newer versions of GNU Coreutils, grep -P
.
Background: The original grep
had a fairly small set of regex operators; it was Ken Thompson's original regular expression implementation. A new version with an extended repertoire was developed later, and for compatibility reasons, got a different name. With GNU grep
, there is only one binary, which understands the traditional, basic RE syntax if invoked as grep
, and ERE if invoked as egrep
. Some constructs from egrep
are available in grep
by using a backslash escape to introduce special meaning.
Subsequently, the Perl programming language has extended the formalism even further; this regex dialect seems to be what most newcomers erroneously expect grep
, too, to support. With grep -P
, it does; but this is not yet widely supported on all platforms.
So, in grep
, the following characters have a special meaning: ^$[]*.\
In egrep
, the following characters also have a special meaning: ()|+?{}
. (The braces for repetition were not in the original egrep
.) The grouping parentheses also enable backreferences with \1
, \2
, etc.
In many versions of grep
, you can get the egrep
behavior by putting a backslash before the egrep
specials. There are also special sequences like \<\>
.
In Perl, a huge number of additional escapes like \w
\s
\d
were introduced. In Perl 5, the regex facility was substantially extended, with non-greedy matching *?
+?
etc, non-grouping parentheses (?:...)
, lookaheads, lookbehinds, etc.
... Having said that, if you really do want to convert egrep
regular expressions to grep
regular expressions without invoking any external process, try ${regex/pattern/substitution}
for each of the egrep
special characters; but recognize that this does not handle character classes, negated character classes, or backslash escapes correctly.