regexsedgnu-sed

Which characters combined with ^ don't need to be escaped in sed?


I have checked that ^* and ^& match lines beginning by * and &, which I didn't since they are special characters. But ^[ doesn't work. Is this "standard" behavior? Is there any rationale behind this?

sed version used was "GNU sed 4.4".


Solution

  • From POSIX.1-2017:

    The sed utility shall support the BREs described in XBD Basic Regular Expressions, ... [sed]

    Reading the POSIX section on BREs, we read:

    A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:

    • .[\: The <period>, <left-square-bracket>, and <backslash> shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is unescaped and is not part of a bracket expression produces undefined results.
    • *: The <asterisk> shall be special except when used:
      • In a bracket expression
      • As the first character of an entire BRE (after an initial '^', if any)
      • As the first character of a subexpression (after an initial '^', if any); see BREs Matching Multiple Characters
    • ^: The <circumflex> shall be special when used as an anchor (see BRE Expression Anchoring). The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression).
    • $: The <dollar-sign> shall be special when used as an anchor.

    source: Basic Regular Expressions, Special characters

    So to answer the OPs question using the above:

    So all observed statements by the OP are therefore valid.

    There is however still an interesting paragraph in RE Bracket Expression:

    A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> ( ] ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex>( ^ ), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as [.].] ) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. The special characters ., *, [, and \\ ( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.

    source: Basic Regular Expressions, RE Bracket Expression

    This implies that ] cannot be escaped in a bracket expression. This means:

    The following work:

    $ echo '[]' | sed 's/[^]x]/a/'
    a]
    $ echo '[]' | sed 's/[^x[.].]]/a/'
    a]
    

    but this does not work as expected:

    $ echo '[]' | sed 's/[^x\]]/a/'
    []
    

    So in a Bracket Expression, dont escape it, but collate it!