pythonregexstringsplitpunctuation

Regex punctuation split with Python


Can anyone help me a bit with regexs? I currently have this: re.split(" +", line.rstrip()), which separates by spaces.

How could I expand this to cover punctuation, too?


Solution

  • The official Python documentation has a good example for this one. It will split on all non-alphanumeric characters (whitespace and punctuation). Literally \W is the character class for all Non-Word characters. Note: the underscore "_" is considered a "word" character and will not be part of the split here.

    re.split('\W+', 'Words, words, words.')
    

    See https://docs.python.org/3/library/re.html for more examples, search page for "re.split"