Is there a way to ignore special character meaning when creating a regular expression in python? In other words, take the string "as is".
I am writing code that uses internally the expect
method from a Telnet
object, which only accepts regular expressions. Therefore, the answer cannot be the obvious "use ==
instead of regular expression".
I tried this
import re
SPECIAL_CHARACTERS = "\\.^$*+?{}[]|():" # backslash must be placed first
def str_to_re(s):
result = s
for c in SPECIAL_CHARACTERS:
result = result.replace(c,'\\'+c)
return re.compile(result)
TEST = "Bob (laughing). Do you know 1/2 equals 2/4 [reference]?"
re_bad = re.compile(TEST)
re_good = str_to_re(TEST)
print re_bad.match(TEST)
print re_good.match(TEST)
It works, since the first one does not recognize the string, and the second one does. I looked at the options in the python documentation, and was not able to find a simpler way. Or are there any cases my solution does not cover (I used python docs to build SPECIAL_CHARACTERS
)?
P.S. The problem can apply to other libraries. It does not apply to the pexpect
library, because it provides the expect_exact
method which solves this problem. However, someone could want to specify a mix of strings (as is) and regular expressions.
If 'reg'
is the regex, you gotta use a raw string as follows
pat = re.compile(r'reg')
If reg
is a name bound to a regex str
, use
reg = re.escape(reg)
pat = re.compile(reg)