pythonregexmetacharacters

Escape all metacharacters in Python


I need to search for patterns which may have many metacharacters. Currently I use a long regex.

prodObjMatcher=re.compile(r"""^(?P<nodeName>[\w\/\:\[\]\<\>\@\$]+)""", re.S|re.M|re.I|re.X)

(my actual pattern is very long so I just pasted some relevant portion on which I need help)

This is especially painful when I need to write combinations of such patterns in a single re compilation.

Is there a pythonic way for shortening the pattern length?


Solution

  • Look, your pattern can be reduced to

    r"""^(?P<nodeName>[]\w/:[<>@$]+).*?"""
    

    Note that you do not have to ever escape any non-word character in the character classes, except for shorthand classes, ^, -, ], and \. There are ways to keep even those (except for \) unescaped in the character class:

    Outside a character class, you must escape \, [, (, ), +, $, ^, *, ?, ..

    Note that / is not a special regex metacharacter in Python regex patterns, and does not have to be escaped.

    Use raw string literals when defining your regex patterns to avoid issues (like confusing word boundary r'\b' and a backspace '\b').