phpregex

Why is this regex backtracking even using atomic group


I have this regex aaa.+?(?>bbb)j and this input string aaa xxx bbby xxx bbbj. When I run this regex, it returns aaa xxx bbby xxx bbbj BUT since I used an atomic group (?>bbb) the regex should have failed because I read online that when using atomic group the regex interpreter does not backtrack.

So, when finding the first bbb in the input string it should "stick" with it, then it will check the next letter and it will be y which is not the intended j so it should give up and fail. However, for some reason the regex interpreter keeps trying, and eventually finds the last bbbj. How can I make it fail at the first bbb it found?

NOTE: I tried using greedy/lazy quantifier but it didnt matter in the case above.


Solution

  • Inside an atomic group backtracking is independent and not related to any other part
    of the other parts of the expression.
    Therefore bbb will match independently.

    The expression inside an atomic group is a separate expression that maintains its own state
    with regard to backtracking. When it comes across the same teritory in the source
    it will match exactly the same text every time.

    For example (?>bbb?) will always match bbb when it comes across it.
    Since its an actual separate expression, it can backtrack within itself
    to match the most it can.

    The only reason (?>bbb)j did not match bbby when it came across it is because it
    was looking for the j, however the bbb was matched. The engine advanced to finf it further along after that.

    Note also that assertions are also atomic in nature, with a conditional applied.