I'm trying to solve a problem to split a string into words, but keeping the delimeter near the previous word. I think it will be clearer with examples
'Company Name: Back in future' -> ['Company', 'Name:', 'Back', 'in', 'future']
'SCALC:WY58' -> ['SCALC:', 'WY58']
'Our carrier.: LTC' -> ['Our', 'carrier.:', 'LTC']
'Lading# 1' -> ['Lading#', '1']
'Invoice of lading-91258963' -> ['Invoice', 'of', 'lading-', '91258963']
That is, as you see the point, any special character should remain part of the previous word
I've been looking for examples for quite a while, but I haven't found anything and I couldn't implement it myself.
Try:
|(?<=[-:]) ?
See: regex101
See Python Demo:
import re
strs=[
'Company Name: Back in future',# -> ['Company', 'Name:', 'Back', 'in', 'future']
'SCALC:WY58',# -> ['SCALC:', 'WY58']
'Our carrier.: LTC',# -> ['Our', 'carrier.:', 'LTC']
'Lading# 1',# -> ['Lading#', '1']
'Invoice of lading-91258963',# -> ['Invoice', 'of', 'lading-', '91258963']
'SST #: 18965',# -> ['SST', '#:', '18965']
]
pattern=re.compile(r" |(?<=[-:]) ?")
[re.split(pattern,s) for s in strs]
Explanation
: Option 1: split on space.|
: Or(?<= ... )
: Option 2: split to the right of
[-:]
: either a hyphen or a colon ?
: that is optionally succeeded by a space.