pythonregexlookbehindnegative-lookbehind

python regex with differing length "or" in look-behind


I have many logs with commands in it. I filtered all logs with "useradd" in them, but now I want to dicard some false positives:

The problem is that I want to see lines with false positive AND real command in them (see test cases).

I can only use (one or more) python regular expressions as I am using a log analyzer program - so no real python program. These are the expressions I tried:

(!/etc/default/|/man8/)useradd # no match
(?<!/etc/default/|/man8/)useradd # look-behind requires fixed-width pattern
(?<!fault/|/man8/)useradd # works, but that's strange

In answers to other questions the regex was changed so that a lookahead could be used - but I don't see how this is possible here.

[Edit: added some test cases]

## no match
cat /etc/default/useradd 
less /usr/share/man/ja/man8/useradd.8.gz
## match:
useradd evil
/usr/sbin/useradd
cat /etc/default/useradd; useradd evil
cat /etc/default/useradd; /usr/sbin/useradd evil
cat /etc/default/useradd; cd /usr/lib/; ../sbin/useradd evil

Solution

  • You can use a lookahead assertion instead:

    ^(?!.*(?:/etc/default|/man8)/useradd(?!.*useradd)).*useradd
    

    Explanation:

    ^               # Start of string
    (?!             # Assert that it's impossible to match...
     .*             # any string, followed by...
     (?:            # this non-capturing group containing...
      /etc/default  # either "/etc/default"
     |              # or
      /man8         # "/man8"
     )              # End of group, followed by...
     /useradd       # "/useradd"
     (?!.*useradd)  # UNLESS another "useradd" follows further up ahead.
    )               # End of lookahead
    .*              # Match anything, then match
    useradd         # "useradd"
    

    See it live on regex101.com.