pythonpython-3.xregexpython-re

Check that both `lookbehind` conditions are satisfied in `RegEx`


I'm trying to check if a username is preceded either by RT @ or by RT@ by using lookbehind mechanism paired with conditionals, as explained in this tutorial. The regex and the example are shown in Example 1:

Example 1

import re

text = 'RT @u1, @u2, u3, @u4, rt @u5:, @u3.@u1^, rt@u3'

mt_regex = r'(?i)(?<!RT )&(?<!RT)@(\w+)'

mt_pat = re.compile(mt_regex)

re.findall(mt_pat, text)

which outputs [] (empty list), while the desired output should be:

['u2', 'u4', 'u3', 'u1']

What am I missing? Thanks in advance.


Solution

  • If we break down your regex:

    r"(?i)(?<!RT )&(?<!RT)@(\w+)"
    (?i)        match the remainder of the pattern, case insensitive match
    (?<!RT )    negative lookbehind
                asserts that 'RT ' does not match
    &           matches the character '&' literally
    (?<!RT)     negative lookbehind 
                asserts that 'RT' does not match
    @           matches the character '@' literally
    (\w+)       Capturing Group    
                matches [a-zA-Z0-9_] between one and unlimited times
    

    You have the & character that is preventing your regex matching:

    import re
    
    text = "RT @u1, @u2, u3, @u4, rt @u5:, @u3.@u1^, rt@u3"
    mt_regex = r"(?i)(?<!RT )(?<!RT)@(\w+)"
    mt_pat = re.compile(mt_regex)
    
    print(re.findall(mt_pat, text))
    # ['u2', 'u4', 'u3', 'u1']
    

    See this regex here