I'm looking to find the index for all substrings in a string in python. My current regex code can't find a match that has it's start in a previous match.
I have a string: s = r'GATATATGCATATACTT'
and a subtring t = r'ATAT'
. There should be matches at index 1, 3, and 9. Using the following code only shows matches at index 1 and 9 because index 3 is within the first match. How do I get all matches to appear?
Thanks so much!
import re
s= 'GATATATGCATATACTT'
t = r'ATAT'
pattern = re.compile(t)
[print(i) for i in pattern.finditer(s)]
Since you have overlapping matches, you need to use a capturing group inside a lookahead as: (?=(YOUEXPR))
import re
s= 'GATATATGCATATACTT'
t = r'(?=(ATAT))'
pattern = re.compile(t)
[print(i) for i in pattern.finditer(s)]
Output:
<re.Match object; span=(1, 1), match=''>
<re.Match object; span=(3, 3), match=''>
<re.Match object; span=(9, 9), match=''>
Or:
[print(i.start()) for i in pattern.finditer(s)]
Output:
1
3
9
Or:
import re
s= 'GATATATGCATATACTT'
t = 'ATAT'
pattern = re.compile(f'(?=({t}))')
print ([(i.start(), s[i.start():i.start()+len(t)]) for i in pattern.finditer(s)])
Output:
[(1, 'ATAT'), (3, 'ATAT'), (9, 'ATAT')]