pythonidentifiermagic-methodsname-mangling

How can I check if an identifier is dunder or class-private (i.e. will be mangled)?


I'm writing a project that gives advice about variable names, and I want it to tell if a name matches any of the reserved classes of identifiers. The first one ("private") is pretty straightforward, just name.startswith('_'), but dunder and class-private names are more complicated. Is there any built-in function that can tell me? If not, what are the internal rules Python uses?

For dunder, checking name.startswith('__') and name.endswith('__') doesn't work because that would match '__' for example. Maybe a regex like ^__\w+__$ would work?

For class-private, name.startswith('__') doesn't work because dunder names aren't mangled, nor are names with just underscores like '___'. So it seems like I'd have to check if the name starts with two underscores, doesn't end with two underscores, and contains at least one non-underscore character. Is that right? In code:

name.startswith('__') and not name.endswith('__') and any(c != '_' for c in name)

I'm mostly concerned about the edge cases, so I want to make sure I get the rules 100% correct. I read What is the meaning of single and double underscore before an object name? but there's not enough detail.


Solution

  • Dunder

    Based on is_dunder_name in Objects/typeobject.c (using str.isascii from Python 3.7):

    len(name) > 4 and name.isascii() and name.startswith('__') and name.endswith('__')
    

    Alternatively, that regex ^__\w+__$ would work, but it would need re.ASCII enabled to make sure \w only matches ASCII characters.

    Class-private

    The rules are documented under Identifiers (Names):

    name.startswith('__') and not name.endswith('__')
    

    (Sidenote: not name.endswith('__') ensures that the name contains at least one non-underscore.)

    There's also a C implementation at _Py_Mangle in Python/compile.c, but it includes a check for a dot, when, strictly speaking, a name with a dot is an "attribute reference", not a name. That'd be equivalent to:

    name.startswith('__') and not name.endswith('__') and not '.' in name
    

    P.S. I can barely read C, so take these translations with a grain of salt.