pythonregexnegative-lookbehind

Regex pattern to match all, and return null when specific words are found


I have this Regex expression. ^BRN.*?(?:paid|to)\s([A-Za-z\s]+)\b(?<!\bself)

I want to it return the words after the required pattern, but only is certain words are not found. If they are found, then the Regex shouldn't return anything. Thus,

BRN CLG-CI IQ PAID IONANDA PAUL

should return IONANDA PAUL, which it does. So it's correct there. But I want

BRN-TO CASH SELF

to return a null string or essentially it matches but returns no output. Currently, the regex returns this CASH\s, the \s means a whitespace is included in the output. I tried negative lookbehind but I am still looking for how to just not return anything, if the word is found. Thanks!


Solution

  • Note that your regex captures the CASH in BRN-TO CASH SELF with ([A-Za-z\s]+)\b because once the word boundary is reached after SELF, the negative lookbehind triggers backtracking, and the regex engine re-matches the string and starts yielding char after char while stepping back along the string to eventually find the word boundary position right before SELF where no SELF as whole word is present immediatelty to the left of that location, and that is a valid match.

    You can use a negative lookahead after \s:

    ^BRN.*?(?:paid|to)\s(?![A-Za-z\s]*\bself\b)([A-Za-z\s]+)
    #                   ^^^^^^^^^^^^^^^^^^^^^^^
    

    See the regex demo.

    Now, right after matching the whitespace after paid or to, the negative lookahead check will be triggered once, and if there is a whole word self after any zero or more ASCII letters or whitespace chars, the whole match will fail, else, it will succeed.