The beginning of a 'heredoc' string in bash usually looks like
cat <<EOF or cat << EOF
i.e. there may or may not be a space between the two less-than characters and the token word "EOF". I would like to catch the token word, so I try the following
$ pcretest
PCRE version 8.45 2021-06-15
re> "^\s*cat.*[^<]<{2}[^<](.*)"
data> cat << EOF
0: cat << EOF
1: EOF
data> cat <<EOF
0: cat <<EOF
1: OF
As you can see in the string where there is no space between the << and EOF, I only catch "OF" and not "EOF". The expression must match exactly two less-than signs and fail if there are three or more. But why does it gobble up the 'E' so that only "OF" is returned?
In your pattern using are using a negated character class [^<]
which matches a single character other than <
which in this case is the E
char in the string <<EOF
For your examples and using pcre, you could match the leading spaces and then match <<
without a following <
^\h*cat\h+<<(?!<)(.*)
The pattern matches:
^
Start of string\h*
Match optional horizontal whitespace charscat\h+
Match cat
and 1+ horizontal whitespace chars<<(?!<)
Match <<
and assert not <
directly to the right(.*)
Capture optional chars in group 1See a regex demo