The 2024 version of POSIX specifies the ?
repetition modifier. In the published document, it's specified that the ?
modifier changes the behavior of its immediately preceeding repetition from matching the left-most longest match to left-most shortest match. A REG_MINIMAL
flag had also been added to the C interface to alter the overall behavior of an ERE match. The original proposals can be found at here and subsequently amended here.
This question seeks clarification on several aspects:
The ?
applies to a repetition, but how does it achieve the "shortest" match? For example given (a.+b)+?
, does the behavior for the +
inside the parentheses accommodate for the outer +
?
For an overall match with groupings/capture/subpattern (correct my terminology if you spot mistake(s)), it is said:
... each subpattern, from left to right, shall match the longest possible string ...
how does that rule apply to ones modified with the new ?
repetition modifier? Since ?
applies to repetition, it shouldn't matter for the length of the parenthesized subpattern itself/proper. Can someone confirm this?
What would be useful examples to demonstrate this?
I seek clarification on its use with parentheses-nested grouping of greedy/lazy quantifiers, whereas What do 'lazy' and 'greedy' mean in the context of regular expressions? just focuses on the greediness/laziness concept itself, as well as relationship between partial and overall matches.
Dusts settled.
In the revised text to be published in the next TC, lazy qualifier have higher precedence than the "overall longest" rule (but not left-most) rule. A lazy qualifier selects the "shortest" match.
The example (a.+b)+
in 1, yes, the inner + does need to accommondate for outter +?. As explained in the standard, all possible matches are enumerated to see which best fit the precedance requirement of quantifier operators.
As to the subpoint 2, it does matter for subexpressions. As explained, this is because POSIX semantic concerns with length, rather than repetition count like the PCRE world.