pythonsubstringtext-search

Substring search with a (regex?) condition Python


I have a situation where I want to search if a substring exists in a large text. So, I was simply using:

if pattern in text: ...

But, I want to ensure that the existence of "pattern" in "text" is not immediately prefixed or suffixed by alphabets. It's alright if it is lead or trailed by special characters, numbers or whitespaces.

So, if pattern is "abc", match on "some text abc", "random texts, abc, cde" should return True, while search on "some textabc", "random abctexts" should return False (because "abc" is lead or trailed by alphabets).

What is the best way to perform this operation?


Solution

  • How about this:

    import re
    
    string = "random texts, abc, cde"
    
    match = re.search(r'(^|[^a-zA-Z])abc([^a-zA-Z]|$)', string)
    # If-statement after search() tests if it succeeded
    if match:
        print('found', match.group())
    else:
        print('did not find')
    

    "(^|[^a-zA-Z])" means: beginning of string OR any non-alphabetic character, ([^a-zA-Z]|$) similar for end of string.

    To explain a bit more: "|" means an OR, so (^|d) means "beginning of line or a d". The brackets are to define on which arguments the OR operator operates. You wanted your abc-string not to be enclosed by any alphabetic character. If you broaden this a little, so that also 0-9 and the underscore are forbidden, you get a simpler regex: r'(^|\W)abc(\W|$)'