I'm trying to use re.findall(pattern, string)
to match all numbers and however many duplicates follow in a string. Eg. "1222344"
matches "1", "222", "3", "44"
. I can't seem to find a pattern to do so though.
I tried using the pattern "(\d)\1+"
to match a digit 1 or more times but it doesn't seem to be working. But when I print the result, it shows up as an empty array []
.
You're on the right track but your pattern (\d)\1+
actually matches two or more contiguous digits (the first digit is matched by \d and then the +
quantifier says match one or more of that digit. So what you want is (\d)\1*
where the *
says match zero or more of that previous digit
The other thing that is perhaps confusing is that re.findall()
only returns a list of the matched subexpressions (in this case the individual digit) to see the entire string matched you can use re.search()
or re.finditer()
to get a match object then access the entire matched string using mo.group(0)
import re
text = "122333444455555666666"
patt = re.compile(r"(\d)\1*")
print()
print(patt.findall(text)) # print list of JUST first digit in each run
print()
for mo in patt.finditer(text): # iterate over all the Match Objects
print(mo.group(0)) # group(0) is the entire matched string
Output is:
['1', '2', '3', '4', '5', '6']
1
22
333
4444
55555
666666