python

How to fix "SyntaxWarning: invalid escape sequence" in Python?


I'm getting lots of warnings like this in Python:

DeprecationWarning: invalid escape sequence \A
  orcid_regex = '\A[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{3}[0-9X]\Z'

DeprecationWarning: invalid escape sequence \/
  AUTH_TOKEN_PATH_PATTERN = '^\/api\/groups'

DeprecationWarning: invalid escape sequence \
  """

DeprecationWarning: invalid escape sequence \.
  DOI_PATTERN = re.compile('(https?://(dx\.)?doi\.org/)?10\.[0-9]{4,}[.0-9]*/.*')

<unknown>:20: DeprecationWarning: invalid escape sequence \(

<unknown>:21: DeprecationWarning: invalid escape sequence \(

What do they mean? And how can I fix them?

In Python 3.12+ the error message is changed from a DeprecationWarning to a SyntaxWarning (changelog):

SyntaxWarning: invalid escape sequence '\A'

Solution

  • \ is the escape character in Python string literals.

    For example if you want to put a tab character in a string you may use:

    >>> print("foo \t bar")
    foo      bar
    

    If you want to put a literal \ in a string you may use \\:

    >>> print("foo \\ bar")
    foo \ bar
    

    Or you may use a "raw string":

    >>> print(r"foo \ bar")
    foo \ bar
    

    You can't just go putting backslashes in string literals whenever you want one. A backslash is only allowed when part of one of the valid escape sequences, and it will cause a DeprecationWarning (< 3.12) or a SyntaxWarning (3.12+) otherwise. For example \A isn't a valid escape sequence:

    $ python3.6 -Wd -c '"\A"'
    <string>:1: DeprecationWarning: invalid escape sequence \A
    $ python3.12 -c '"\A"'
    <string>:1: SyntaxWarning: invalid escape sequence '\A'
    

    If your backslash sequence does accidentally match one of Python's escape sequences, but you didn't mean it to, that's even worse because the data is just corrupted without any error or warning.

    So you should always use raw strings or \\.

    It's important to remember that a string literal is still a string literal even if that string is intended to be used as a regular expression. Python's regular expression syntax supports many special sequences that begin with \. For example \A matches the start of a string. But \A is not valid in a Python string literal! This is invalid:

    my_regex = "\Afoo"
    

    Instead you should do this:

    my_regex = r"\Afoo"
    

    Docstrings are another one to remember: docstrings are string literals too, and invalid \ sequences are invalid in docstrings too! Use r"""raw strings""" for docstrings if they must contain \.