[SOLVED] How to find all occurrences of a substring in a string while ignoring some characters in Python?

How to find all occurrences of a substring in a string while ignoring some characters in Python?

I'd like to find all occurrences of a substring while ignoring some characters. How can I do it in Python?

Example:

long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
chars_to_ignore = ['"', '`']
print(find_occurrences(long_string, small_string))

should return [(10, 16), (27, 31)] because we want to ignore the presence of chars ` and ".

(10, 16) is the start and end index of t`es"t in long_string,
(27, 31) is the start and end index of test in long_string.

Using re.finditer won't work as it does not ignore the presence of chars ` and ":

import re
long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
matches = []
[matches.append((m.start(), m.end())) for m in re.finditer(small_string, long_string)]
print(matches)

outputs [(27, 31)], which missed (10, 16).

Solution

Try:

import re

long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
chars_to_ignore = ['"', "`"]

tmp = "(?:" + "|".join(re.escape(c) for c in chars_to_ignore) + ")*"
for m in re.finditer(tmp.join(re.escape(c) for c in small_string), long_string):
    print(m)

Prints:

<re.Match object; span=(10, 16), match='t`es"t'>
<re.Match object; span=(27, 31), match='test'>