pythonregextelnetlib

Ignore special characters when creating a regular expression in python


Is there a way to ignore special character meaning when creating a regular expression in python? In other words, take the string "as is".

I am writing code that uses internally the expect method from a Telnet object, which only accepts regular expressions. Therefore, the answer cannot be the obvious "use == instead of regular expression".

I tried this

import re

SPECIAL_CHARACTERS = "\\.^$*+?{}[]|():"  # backslash must be placed first
def str_to_re(s):
  result = s
  for c in SPECIAL_CHARACTERS:
    result = result.replace(c,'\\'+c)
  return re.compile(result)

TEST = "Bob (laughing).  Do you know 1/2 equals 2/4 [reference]?"
re_bad = re.compile(TEST)
re_good = str_to_re(TEST)

print re_bad.match(TEST)
print re_good.match(TEST)

It works, since the first one does not recognize the string, and the second one does. I looked at the options in the python documentation, and was not able to find a simpler way. Or are there any cases my solution does not cover (I used python docs to build SPECIAL_CHARACTERS)?

P.S. The problem can apply to other libraries. It does not apply to the pexpect library, because it provides the expect_exact method which solves this problem. However, someone could want to specify a mix of strings (as is) and regular expressions.


Solution

  • If 'reg' is the regex, you gotta use a raw string as follows

    pat = re.compile(r'reg')
    

    If reg is a name bound to a regex str, use

    reg = re.escape(reg)
    pat = re.compile(reg)