I'm trying to check if a username is preceded either by RT @ or by RT@ by using lookbehind mechanism paired with conditionals, as explained in this tutorial.
The regex and the example are shown in Example 1:
Example 1
import re
text = 'RT @u1, @u2, u3, @u4, rt @u5:, @u3.@u1^, rt@u3'
mt_regex = r'(?i)(?<!RT )&(?<!RT)@(\w+)'
mt_pat = re.compile(mt_regex)
re.findall(mt_pat, text)
which outputs [] (empty list), while the desired output should be:
['u2', 'u4', 'u3', 'u1']
What am I missing? Thanks in advance.
If we break down your regex:
r"(?i)(?<!RT )&(?<!RT)@(\w+)"
(?i) match the remainder of the pattern, case insensitive match
(?<!RT ) negative lookbehind
asserts that 'RT ' does not match
& matches the character '&' literally
(?<!RT) negative lookbehind
asserts that 'RT' does not match
@ matches the character '@' literally
(\w+) Capturing Group
matches [a-zA-Z0-9_] between one and unlimited times
You have the & character that is preventing your regex matching:
import re
text = "RT @u1, @u2, u3, @u4, rt @u5:, @u3.@u1^, rt@u3"
mt_regex = r"(?i)(?<!RT )(?<!RT)@(\w+)"
mt_pat = re.compile(mt_regex)
print(re.findall(mt_pat, text))
# ['u2', 'u4', 'u3', 'u1']
See this regex here