pythonregexsplitcapturing-groupcharacter-class

Regex [] vs () in Python with respect to re.split()


What is the difference between [,.] and (,|.) when used as a pattern in re.split(pattern,string)? Can some please explain with respect to this example in Python:

import re
regex_pattern1 = r"[,\.]"
regex_pattern2 = r"(,|\.)"
print(re.split(regex_pattern1, '100,000.00')) #['100', '000', '00']
print(re.split(regex_pattern2, '100,000.00'))) #['100', ',', '000', '.', '00']

Solution

  • [,\.] is equivalent to ,|\..[1]

    (,|\.) is equivalent to ([,\.]).

    () creates a capture, and re.split returns captured text as well as the text separated by the pattern.

    >>> import re
    >>> re.split(r'([,\.])', '100,000.00')
    ['100', ',', '000', '.', '00']
    >>> re.split(r'(,|\.)', '100,000.00')
    ['100', ',', '000', '.', '00']
    >>> re.split(r',|\.', '100,000.00')
    ['100', '000', '00']
    >>> re.split(r'(?:,|\.)', '100,000.00')
    ['100', '000', '00']
    >>> re.split(r'[,\.]', '100,000.00')
    ['100', '000', '00']
    

    1. You might sometime need (?:,|\.) to limit what is considered the operands of | when you embed it in a larger pattern, though.