regexregex-lookaroundspostfix-mtaemail-spam

Avoid negative lookahead regex in mail spam detection


Typical legitimate email "header from" fields look like:

From: DHL <noreply@dhl.com>

From: DHL <legit.sender@noreply.dhl.com>

while non legitimate email "header from" fields look like:

From: DHL <fake.sender@somedomain.com>

I would like to write a regex that matches in case of non legitimate email "header from", that is, if DHL appears after the "From:", then what is inside the <> must end with dhl.com

I came with the following regex using negative lookahead (not sure it's 100% exact but it seems to work):

^From: DHL <.*@(?!.*dhl\.com>$)

Now the problem is that my regex engine does NOT support negative lookahead and I'm trying to replace it by an equivalent non capturing group like:

(?:[^d]|d[^h]|dh[^l]|dhl[^\.]|dhl\.[^c]|dhl\.c[^o]|dhl\.co[[^m])

without success so far. Any idea ?

And if there is a solution, I would like to support case mixing as well (like DhL.COm).

I am looking for a Postfix solution, ideally one which does not require PCRE.


Solution

  • Since you can't use negative lookahead and you only know what you don't wanna match you would have to test it letter by letter with a lot of | operators

    Something like this should work:

    ^From: DHL <.*@.*[^mM]>$|^From: DHL <.*@.*[^oO][mM]>$|^From: DHL <.*@.*[^cC][oO][mM]>$|^From: DHL <.*@.*[^.][cC][oO][mM]>$|^From: DHL <.*@.*[^lL][.][cC][oO][mM]>$|^From: DHL <.*@.*[^hH][lL][.][cC][oO][mM]>$|^From: DHL <.*@.*[^dD][hH][lL][.][cC][oO][mM]$>