pythonregex

Python Regex match or potential match


How do I use Python's regular expression module (re) to determine if a match has been made, or that a potential match could be made?

I want a regex pattern which searches for a pattern of words in a correct order regardless of what's between them. I want a function which returns "Yes" if found, "Maybe" if a match could still be found, or "No" if no match can be found.

We are looking for the pattern One|....|Two|....|Three, here are some examples (note the names, their count, or their order are not important, all I care about is the three words One, Two and Three, and the acceptable words in between are John, Malkovich, Stamos and Travolta).

Input Result
One|John|Malkovich|Two|John|Stamos|Three|John|Travolta Yes
One|John|Two|John|Three|John Yes
One|Two|Three Yes
One|Two Maybe
One Maybe
Three|Two|One No

I understand the examples are not airtight, so here is what I have for the regex to get "Yes":

if re.match('One\|(John\||Malkovich\||Stamos\||Travolta\|)*Two\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
   return 'Yes'

Obviously if the pattern is Three|Two|One the above will fail, and we can return "No", but how do I check for the "Maybe" case? I thought about nesting the parentheses, like so (note, not tested):

if re.match('One\|((John\||Malkovich\||Stamos\||Travolta\|)*Two(\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*)*)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
   return 'Yes'

But I don't think that will do what I want it to do.

Background

I am not actually looking for Travolta and Malkovich. I am matching against inotify Patterns such as IN_MOVE, IN_CREATE, IN_OPEN, and I am logging them and getting hundreds of them, then I go in and then look for a particular pattern such as IN_ACCESS...IN_OPEN...IN_MODIFY, but in some cases I don't want an IN_DELETE after the IN_OPEN and in others I do.

I'm essentially pattern matching to use inotify to detect when text editors gone wild and are doing a temporary-file-swap-save instead of just modifying the file.

I don't want to free up those logs instantly, but I only want to hold on to them for as long as is necessary. "Maybe" means dont erase the logs. "Yes" means do something, then erase the log, and "No" means don't do anything but still erase the logs.

As I will have multiple rules for each program (i.e., vim v gedit v emacs) I wanted to use a regular expression which would be more human readable and easier to write than creating a massive tree, or as user Joel suggested, just going over the words with a loop.


Solution

  • Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

    Perhaps an algorithm like this would be more appropriate. Here is some pseudocode.

    matchlist.current = matchlist.first()
    for each word in input
        if word = matchlist.current
            matchlist.current = matchlist.next() // assuming next returns null if at end of list
        else if not allowedlist.contains(word)
            return 'No'
    if matchlist.current = null // we hit the end of the list
        return 'Yes'
    return 'Maybe'