[SOLVED] Python Regex Finding a Match That Starts Inside Previous match

Python Regex Finding a Match That Starts Inside Previous match

I'm looking to find the index for all substrings in a string in python. My current regex code can't find a match that has it's start in a previous match.

I have a string: s = r'GATATATGCATATACTT' and a subtring t = r'ATAT'. There should be matches at index 1, 3, and 9. Using the following code only shows matches at index 1 and 9 because index 3 is within the first match. How do I get all matches to appear?

Thanks so much!

import re

s= 'GATATATGCATATACTT'
t = r'ATAT'

pattern = re.compile(t)

[print(i) for i in pattern.finditer(s)]

Solution

Since you have overlapping matches, you need to use a capturing group inside a lookahead as: (?=(YOUEXPR))

import re

s= 'GATATATGCATATACTT'
t = r'(?=(ATAT))'

pattern = re.compile(t)

[print(i) for i in pattern.finditer(s)]

Output:

<re.Match object; span=(1, 1), match=''>
<re.Match object; span=(3, 3), match=''>
<re.Match object; span=(9, 9), match=''>

Or:

[print(i.start()) for i in pattern.finditer(s)]

Output:

1
3
9

Or:

import re

s= 'GATATATGCATATACTT'
t = 'ATAT'

pattern = re.compile(f'(?=({t}))')

print ([(i.start(), s[i.start():i.start()+len(t)]) for i in pattern.finditer(s)])

Output:

[(1, 'ATAT'), (3, 'ATAT'), (9, 'ATAT')]