pythonstringtextsplitspecial-characters

Split the string so that special characters remain in the previous word


I'm trying to solve a problem to split a string into words, but keeping the delimeter near the previous word. I think it will be clearer with examples

'Company Name: Back in future' -> ['Company', 'Name:', 'Back', 'in', 'future']

'SCALC:WY58' -> ['SCALC:', 'WY58']

'Our carrier.: LTC' -> ['Our', 'carrier.:', 'LTC']

'Lading# 1' -> ['Lading#', '1']

'Invoice of lading-91258963' -> ['Invoice', 'of', 'lading-', '91258963']

That is, as you see the point, any special character should remain part of the previous word

I've been looking for examples for quite a while, but I haven't found anything and I couldn't implement it myself.


Solution

  • Try:

     |(?<=[-:]) ?
    

    See: regex101

    See Python Demo:

    import re
    
    strs=[
    'Company Name: Back in future',# -> ['Company', 'Name:', 'Back', 'in', 'future']
    
    'SCALC:WY58',# -> ['SCALC:', 'WY58']
    
    'Our carrier.: LTC',# -> ['Our', 'carrier.:', 'LTC']
    
    'Lading# 1',# -> ['Lading#', '1']
    
    'Invoice of lading-91258963',# -> ['Invoice', 'of', 'lading-', '91258963']
    
    'SST #: 18965',# ->  ['SST', '#:', '18965']
    ]
    
    pattern=re.compile(r" |(?<=[-:]) ?")
    
    [re.split(pattern,s) for s in strs]
    

    Explanation