regexperlspamassassin

Regex help specific to Spamassassin


I'm trying to create a filter for social security numbers and have the following regex:

\b(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}\b

The problem is that the regex also matches the following type of string in Spamassassin and I haven't been able to solve the problem.

18-007-08-9056-1462-2205

I would like it to match only if the SSN string is on its own. Examples:

18 007-08-9056 1462-2205
007-08-9056
xyz 007-08-9056
007-08-9056 xyz

Solution

  • Your problem is that \b matches at the word boundary, and - is considered a word boundary. You can try something like this:

    (?:^|[^-\d])((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4})(?:$|[^-\d])
    

    Match will then be available in $1. You might be able to find more elegant solution based on your specific kind of input strings. (E.g. will the SSN always have whitespace around it? If so, you can use \s, etc.)