regexbashsed

Is there another regular-expression "flavor" in GNU sed?


I love sed but I hate how many backslashes are needed in its regular expressions. For example, here is a sed command that will take the first 8 words out of each line of input:

sed -n 's/^\(\S\+\s\+\)\{8\}\(.*\)/\2/p'

Ugly.

Almost every character has a backslash preceding it. It would be much nicer if sed would assume that special characters were special by default.

Here is how I would like the expression to look:

s/^(\S+\s){8}(.*)/\2/p

Is there a way to achieve this?


Solution

  • Switch to ERE in sed

    As Avinash Raj has pointed out, sed uses basic regular expression (BRE) syntax by default, (which requires (, ), {, } to be preceded by \ to activate its special meaning), and -r option switches over to extended regular expression (ERE) syntax, which treats (, ), {, } as special without preceding \.

    POSIX standard

    Except for these escape sequences:

    \^    \.    \[    \$    \(    \)    \|
    \*    \+    \?    \{    \\
    

    the POSIX standard explicitly leaves the behavior undefined for other escape sequences in ERE.

    An ordinary character is an ERE that matches itself. An ordinary character is any character in the supported character set, except for the ERE special characters listed in ERE Special Characters. The interpretation of an ordinary character preceded by a backslash ( '\' ) is undefined.

    Since the behavior is undefined, implementations are free to provide extensions to the syntax.

    GNU extensions to escape sequences

    As rici has noted in the comment, \s and \S are GNU extensions. GNU implementation also provides the following extensions for regular expression and replacement string syntax (for both BRE and ERE):

    \a \f \n \r \t \v
    \cX
    \dXXX
    \oXXX
    \xXX
    

    and the following extensions for use in regular expression only:

    \w \W
    \b \B
    \'
    \`
    

    Plus these undocumented/under-documented extensions:

    \s \S
    \< \>
    

    If the code never runs on non-GNU implementation of sed, your current code is acceptable.