pythonregex

How to skip, if starts with, but match other strings


I want to match and substitute for strings as shown in the example below, but not for some strings which start with test or !!. I have used negative lookahead to skip matching unwanted strings but (Street|,)(?=\d) matching for Street & comma replacing group 1 with UK/ is not working as expected.

import re
input = [ 'Street1-2,4,6,8-10', 
          '!! Street4/31/2',
          'test Street4' ]
pattern = r'(^(?!test\s|!!\s).*(Street|,)(?=\d))'
output = [re.sub(pattern, r'\g<1>UK/', line) for line in input ]

Actual output:

['Street1-2,4,6,UK/8-10', '!! Street4/31/2', 'test Street4']

Expected output:

['StreetUK/1-2,UK/4,UK/6,UK/8-10', '!! Street4/31/2', 'test Street4']

Solution

  • You could change the pattern to use 2 capture groups, and then use a callback with re.sub.

    The callback checks if there is a group 1 value. If there is, use it in the replacement, else use group 2 followed by UK/

    ^((?:!!|test)\s.*)|(Street|,)(?=\d)
    

    The regex matches

    See a regex101 demo

    import re
    
    lst = ['Street1-2,4,6,8-10',
           '!! Street4/31/2',
           'test Street4']
    
    pattern = r'^((?:!!|test)\s.*)|(Street|,)(?=\d)'
    
    output = [re.sub(pattern, lambda m: m.group(1) or m.group(2) + 'UK/', line) for line in lst]
    
    print(output)
    

    Output

    ['StreetUK/1-2,UK/4,UK/6,UK/8-10', '!! Street4/31/2', 'test Street4']