I have come across a different behaviour of search function in regex which made me think that there is an implicit \b anchor in the pattern. Is this the case?
text = "bowl"
print(re.search(r"b|bowl", text)) # first alteration in this pattern works
print(re.search(r"o|bowl", text)) # but first alteration won't work here
print(re.search(r"w|bowl", text)) # nor here
print(re.search(r"l|bowl", text)) # nor here
print(re.search(r"bo|bowl", text)) # first alteration in this pattern works
print(re.search(r"bow|bowl", text)) # first alteration in this pattern works
OUTPUT
<re.Match object; span=(0, 1), match='b'>
<re.Match object; span=(0, 4), match='bowl'>
<re.Match object; span=(0, 4), match='bowl'>
<re.Match object; span=(0, 4), match='bowl'>
<re.Match object; span=(0, 2), match='bo'>
<re.Match object; span=(0, 3), match='bow'>
I have researched that if this was the case but I couldn't find any explanation.
I'm not a regex expert, so I'll use simple words to describe what happens internally.
search
works from left to right, and the |
patterns too. Also search
is different from match
and moves forward to try to find the pattern across the string, not just at start.
Take this:
re.search(r"o|bowl", text)
So if o
pattern is tested against, since matcher is on b
character of the input string, it doesn't match, and the code tries the second pattern. If it failed, it would skip to next character (since all match possibilities are exhausted) and would match o
, but since it matches, it doesn't happen: bowl
characters are consumed.
If you try:
re.search("o|bar", text)
then o
will be matched.
Note that it's not specific to python. That's how a correct regex engine works.
If you want the alternate behaviour you could write:
re.search("o", text) or re.search("bar", text)