regexpcre2

Expression steals from previously possessively matched characters


When using PCRE2 on regex101, group 1 in \G(.*?(?:<[^>]*>|&[^;]*;)*+)*?(micro) used on the string "<test><micro>" first matches the entire string (as expected) but (micro) then goes back and steals from the previously possessively matched content. Is this expected behavior? If so, why does it happen and how do i avoid it?

What i tried:

What i expect:

The regex should match everything up the a "micro" that is neither in <> nor in &; and capture it in group 1, then capture the micro in group 2.


Solution

  • You could use

    (?:<[^>]*>|&[^; ]*;)(*SKIP)(*F)|micro
    

    The pattern matches:

    See a regex demo