I'm writing a project that gives advice about variable names, and I want it to tell if a name matches any of the reserved classes of identifiers. The first one ("private") is pretty straightforward, just name.startswith('_')
, but dunder and class-private names are more complicated. Is there any built-in function that can tell me? If not, what are the internal rules Python uses?
For dunder, checking name.startswith('__') and name.endswith('__')
doesn't work because that would match '__'
for example. Maybe a regex like ^__\w+__$
would work?
For class-private, name.startswith('__')
doesn't work because dunder names aren't mangled, nor are names with just underscores like '___'
. So it seems like I'd have to check if the name starts with two underscores, doesn't end with two underscores, and contains at least one non-underscore character. Is that right? In code:
name.startswith('__') and not name.endswith('__') and any(c != '_' for c in name)
I'm mostly concerned about the edge cases, so I want to make sure I get the rules 100% correct. I read What is the meaning of single and double underscore before an object name? but there's not enough detail.
Based on is_dunder_name
in Objects/typeobject.c
(using str.isascii
from Python 3.7):
len(name) > 4 and name.isascii() and name.startswith('__') and name.endswith('__')
Alternatively, that regex ^__\w+__$
would work, but it would need re.ASCII
enabled to make sure \w
only matches ASCII characters.
The rules are documented under Identifiers (Names):
name.startswith('__') and not name.endswith('__')
(Sidenote: not name.endswith('__')
ensures that the name contains at least one non-underscore.)
There's also a C implementation at _Py_Mangle
in Python/compile.c
, but it includes a check for a dot, when, strictly speaking, a name with a dot is an "attribute reference", not a name. That'd be equivalent to:
name.startswith('__') and not name.endswith('__') and not '.' in name
P.S. I can barely read C, so take these translations with a grain of salt.