I have checked that ^*
and ^&
match lines beginning by *
and &
, which I didn't since they are special characters. But ^[
doesn't work. Is this "standard" behavior? Is there any rationale behind this?
sed
version used was "GNU sed 4.4".
From POSIX.1-2017:
The sed utility shall support the BREs described in XBD Basic Regular Expressions, ... [sed]
Reading the POSIX section on BREs, we read:
A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:
.[\
: The <period>, <left-square-bracket>, and <backslash> shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is unescaped and is not part of a bracket expression produces undefined results.*
: The <asterisk> shall be special except when used:
- In a bracket expression
- As the first character of an entire BRE (after an initial '^', if any)
- As the first character of a subexpression (after an initial '^', if any); see BREs Matching Multiple Characters
^
: The <circumflex> shall be special when used as an anchor (see BRE Expression Anchoring). The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression).$
: The <dollar-sign> shall be special when used as an anchor.
So to answer the OPs question using the above:
&
is not a special character, so ^&
is expected to work[
should always be escaped if it is not used as a bracket expression.*
is not special after an initial ^
when the latter is an anchor.So all observed statements by the OP are therefore valid.
There is however still an interesting paragraph in RE Bracket Expression:
A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> (
]
) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex>(^
), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as[.].]
) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. The special characters.
,*
,[
, and\\
( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.
This implies that ]
cannot be escaped in a bracket expression. This means:
The following work:
$ echo '[]' | sed 's/[^]x]/a/'
a]
$ echo '[]' | sed 's/[^x[.].]]/a/'
a]
but this does not work as expected:
$ echo '[]' | sed 's/[^x\]]/a/'
[]
So in a Bracket Expression, dont escape it, but collate it!