pythonstringsplitpython-reapostrophe

python re split at all space and punctuation except for the apostrophe


i want to split a string by all spaces and punctuation except for the apostrophe sign. Preferably a single quote should still be used as a delimiter except for when it is an apostrophe. I also want to keep the delimeters. example string
words = """hello my name is 'joe.' what's your's"""

Here is my re pattern thus far splitted = re.split(r"[^'-\w]",words.lower()) I tried throwing the single quote after the ^ character but it is not working.

My desired output is this. splitted = [hello,my,name,is,joe,.,what's,your's]


Solution

  • One option is to make use of lookarounds to split at the desired positions, and use a capture group what you want to keep in the split.

    After the split, you can remove the empty entries from the resulting list.

    \s+|(?<=\s)'|'(?=\s)|(?<=\w)([,.!?])
    

    The pattern matches

    See a regex demo and a Python demo.

    Example

    import re
    
    pattern = r"\s+|(?<=\s)'|'(?=\s)|(?<=\w)([,.!?])"
    words = """hello my name is 'joe.' what's your's"""
    result = [s for s in re.split(pattern, words) if s]
    print(result)
    

    Output

    ['hello', 'my', 'name', 'is', 'joe', '.', "what's", "your's"]