pythonregexstringpython-re

Using re to match a digit + any contiguous duplicates and storing the duplicates, not just the digit as the result


I'm trying to use re.findall(pattern, string) to match all numbers and however many duplicates follow in a string. Eg. "1222344" matches "1", "222", "3", "44". I can't seem to find a pattern to do so though.

I tried using the pattern "(\d)\1+" to match a digit 1 or more times but it doesn't seem to be working. But when I print the result, it shows up as an empty array [].


Solution

  • You're on the right track but your pattern (\d)\1+ actually matches two or more contiguous digits (the first digit is matched by \d and then the + quantifier says match one or more of that digit. So what you want is (\d)\1* where the * says match zero or more of that previous digit

    The other thing that is perhaps confusing is that re.findall() only returns a list of the matched subexpressions (in this case the individual digit) to see the entire string matched you can use re.search() or re.finditer() to get a match object then access the entire matched string using mo.group(0)

    import re

    text = "122333444455555666666"
    
    patt = re.compile(r"(\d)\1*")
    
    print()
    print(patt.findall(text))       # print list of JUST first digit in each run
    
    print()
    for mo in patt.finditer(text):  # iterate over all the Match Objects 
        print(mo.group(0))          # group(0) is the entire matched string
    

    Output is:

    ['1', '2', '3', '4', '5', '6']  
    
    1
    22
    333
    4444
    55555
    666666