pythonstringpython-re

How to find all occurrences of a substring in a string while ignoring some characters in Python?


I'd like to find all occurrences of a substring while ignoring some characters. How can I do it in Python?

Example:

long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
chars_to_ignore = ['"', '`']
print(find_occurrences(long_string, small_string))

should return [(10, 16), (27, 31)] because we want to ignore the presence of chars ` and ".


Using re.finditer won't work as it does not ignore the presence of chars ` and ":

import re
long_string = 'this is a t`es"t. Does the test work?'
small_string = "test"
matches = []
[matches.append((m.start(), m.end())) for m in re.finditer(small_string, long_string)]
print(matches)

outputs [(27, 31)], which missed (10, 16).


Solution

  • Try:

    import re
    
    long_string = 'this is a t`es"t. Does the test work?'
    small_string = "test"
    chars_to_ignore = ['"', "`"]
    
    tmp = "(?:" + "|".join(re.escape(c) for c in chars_to_ignore) + ")*"
    for m in re.finditer(tmp.join(re.escape(c) for c in small_string), long_string):
        print(m)
    

    Prints:

    <re.Match object; span=(10, 16), match='t`es"t'>
    <re.Match object; span=(27, 31), match='test'>