I'd like to find all occurrences of a substring while ignoring some characters. How can I do it in Python?
Example:
long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
chars_to_ignore = ['"', '`']
print(find_occurrences(long_string, small_string))
should return [(10, 16), (27, 31)] because we want to ignore the presence of chars ` and ".
(10, 16)
is the start and end index of t`es"t in long_string
,(27, 31)
is the start and end index of test
in long_string
.Using re.finditer
won't work as it does not ignore the presence of chars ` and ":
import re
long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
matches = []
[matches.append((m.start(), m.end())) for m in re.finditer(small_string, long_string)]
print(matches)
outputs [(27, 31)]
, which missed (10, 16)
.
Try:
import re
long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
chars_to_ignore = ['"', "`"]
tmp = "(?:" + "|".join(re.escape(c) for c in chars_to_ignore) + ")*"
for m in re.finditer(tmp.join(re.escape(c) for c in small_string), long_string):
print(m)
Prints:
<re.Match object; span=(10, 16), match='t`es"t'>
<re.Match object; span=(27, 31), match='test'>