I want to match and substitute for strings as shown in the example below, but not for some strings which start with test
or !!
. I have used negative lookahead to skip matching unwanted strings but (Street|,)(?=\d)
matching for Street
& comma replacing group 1 with UK/
is not working as expected.
import re
input = [ 'Street1-2,4,6,8-10',
'!! Street4/31/2',
'test Street4' ]
pattern = r'(^(?!test\s|!!\s).*(Street|,)(?=\d))'
output = [re.sub(pattern, r'\g<1>UK/', line) for line in input ]
Actual output:
['Street1-2,4,6,UK/8-10', '!! Street4/31/2', 'test Street4']
Expected output:
['StreetUK/1-2,UK/4,UK/6,UK/8-10', '!! Street4/31/2', 'test Street4']
You could change the pattern to use 2 capture groups, and then use a callback with re.sub.
The callback checks if there is a group 1 value. If there is, use it in the replacement, else use group 2 followed by UK/
^((?:!!|test)\s.*)|(Street|,)(?=\d)
The regex matches
^((?:!!|test)\s.*)
Capture either !!
or test
at the start of the string followed by a whitespace char and then the rest of the line in group 1|
Or(Street|,)(?=\d)
Capture either Street
or ,
in group 2 while asserting a digit to the rightSee a regex101 demo
import re
lst = ['Street1-2,4,6,8-10',
'!! Street4/31/2',
'test Street4']
pattern = r'^((?:!!|test)\s.*)|(Street|,)(?=\d)'
output = [re.sub(pattern, lambda m: m.group(1) or m.group(2) + 'UK/', line) for line in lst]
print(output)
Output
['StreetUK/1-2,UK/4,UK/6,UK/8-10', '!! Street4/31/2', 'test Street4']