I thought that in regular expressions, the "greediness" applies to quantifiers rather than matches as a whole. However, I observe that
grep -E --color=auto 'a+(ab)?' <(printf "aab")
returns aab rather than aab.
The same applies to sed. On the other hand, in pcregrep and other tools, it is really the quantifier that is greedy. Is this a specific behaviour of grep?
N.B. I checked both grep (BSD grep) 2.5.1-FreeBSD and grep (GNU grep) 3.1
In the description of term matched, POSIX states that
The search for a matching sequence starts at the beginning of a string and stops when the first sequence matching the expression is found, where "first" is defined to mean "begins earliest in the string". If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched.
This statement clearly anwers your question. The string aab
contains two substrings beginning at the same position matching the ERE a+(ab)?
; these are aa
and aab
. The latter is the longest, thus it's matched.