How do I use Python's regular expression module (re
) to determine if a match has been made, or that a potential match could be made?
I want a regex pattern which searches for a pattern of words in a correct order regardless of what's between them. I want a function which returns "Yes" if found, "Maybe" if a match could still be found, or "No" if no match can be found.
We are looking for the pattern One|....|Two|....|Three
, here are some examples (note the names, their count, or their order are not important, all I care about is the three words One
, Two
and Three
, and the acceptable words in between are John
, Malkovich
, Stamos
and Travolta
).
Input | Result |
---|---|
One|John|Malkovich|Two|John|Stamos|Three|John|Travolta |
Yes |
One|John|Two|John|Three|John |
Yes |
One|Two|Three |
Yes |
One|Two |
Maybe |
One |
Maybe |
Three|Two|One |
No |
I understand the examples are not airtight, so here is what I have for the regex to get "Yes":
if re.match('One\|(John\||Malkovich\||Stamos\||Travolta\|)*Two\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
Obviously if the pattern is Three|Two|One
the above will fail, and we can return "No", but how do I check for the "Maybe" case? I thought about nesting the parentheses, like so (note, not tested):
if re.match('One\|((John\||Malkovich\||Stamos\||Travolta\|)*Two(\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*)*)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
But I don't think that will do what I want it to do.
I am not actually looking for Travolta
and Malkovich
. I am matching against inotify Patterns such as IN_MOVE
, IN_CREATE
, IN_OPEN
, and I am logging them and getting hundreds of them, then I go in and then look for a particular pattern such as IN_ACCESS
...IN_OPEN
...IN_MODIFY
, but in some cases I don't want an IN_DELETE
after the IN_OPEN
and in others I do.
I'm essentially pattern matching to use inotify to detect when text editors gone wild and are doing a temporary-file-swap-save instead of just modifying the file.
I don't want to free up those logs instantly, but I only want to hold on to them for as long as is necessary. "Maybe" means dont erase the logs. "Yes" means do something, then erase the log, and "No" means don't do anything but still erase the logs.
As I will have multiple rules for each program (i.e., vim
v gedit
v emacs
) I wanted to use a regular expression which would be more human readable and easier to write than creating a massive tree, or as user Joel suggested, just going over the words with a loop.
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski
Perhaps an algorithm like this would be more appropriate. Here is some pseudocode.
matchlist.current = matchlist.first()
for each word in input
if word = matchlist.current
matchlist.current = matchlist.next() // assuming next returns null if at end of list
else if not allowedlist.contains(word)
return 'No'
if matchlist.current = null // we hit the end of the list
return 'Yes'
return 'Maybe'