When I write obj.attribute in Python, what exactly is the sequence of operations and lookups that Python performs to resolve this attribute?
I want to know when and how methods like __getattribute__ and __getattr__ are invoked and how metaclasses might influence attribute access.
A step-by-step would really help understand how Python's objects work.
Python will normally retrieve the "expected" attribute - but the as you put it, there are many steps and fallbacks.
When retrieving an attribute for reading:
The first step in the class __getattribute__ method - which is usually the default object.__getattribute__.
In [24]: class A:
...: def __getattribute__(self, attrname):
...: if attrname == "a":
...: return 5
...: raise AttributeError
...:
In [25]: a = A()
In [26]: a.a
Out[26]: 5
In [27]: a.b
---------------------------------------------------------------------------
AttributeError
but we usually do not override __getattribute__ and let the default implementation in object.__getattribute__ - which performs steps 2-5 (but not 6)
Inside getattribute, Python will search the attribute in the class itself, or in the superclasses, linearized by the mro. If it is found, Python checks if it is a data descriptor:
If the attribute in the class (or superclasses) is an instance of a class which implements either __set__ or __delete__ (check the descriptor protocol) - it is a DATA descriptor, and Python will try to call the __get__ method in this class, passing the instance as parameter.
Note that "methods" are usually "non data descriptors", and attributes defined in the class or super-classes using __slots__ are exposed to Python language as data descriptors themselves.
In [29]: class A:
...: @property
...: def a(self):
...: return 5
...: # the "property" decorator always creates an object which is a data descriptor:
...: # it contains the `__set__` method even if not declared
...:
In [30]: a = A()
In [31]: # injects another value than "5" in the instance:
In [32]: a.__dict__["a"] = 23
In [33]: # this bypasses the descriptor protocol writting the "23"
In [34]: a.a
Out[34]: 5
In [35]: # but the descriptor is still called!
If the attribute is not a data descriptor, Python will search the instance .__dict__ dictionary for a key equal to the attribute name.
If that key exists, it is returned.
If not, Python will check if the class (or superclasses) contain that attribute as a non-data descriptor (for example, a method - which features a __get__ but no __set__)- then, that method is retrieved by calling __get__.
Here an example for '3' and '4': the existing method a is replaced by another function in the instance: that is called instead of the method defined in the class:
In [36]: class A:
...: def a(self):
...: return 5
...:
In [37]: a = A()
In [38]: a.a = lambda: 23
In [40]: a.a()
Out[40]: 23
Note that functions attached to instances like this don't have the auto-inserted self attribute: the mechanisms that retrieve a method using the descriptor protocol are the ones which insert the self value in the method call!
Python will return the class attribute for that attribute, even if it is not a descriptor. The attribute should be a plain attribute in the class or a superclass according to the mro: either in the type(instance).__dict__ - the whole algorithm is not re-used to search for the attribute in the class if it is in the class of the class (the metaclass), or a descriptor assigned in the metaclass, it is not found.
In [45]: class A:
...: a = 5
...:
In [46]: a = A()
In [47]: a.a = 23 # creates an instance value of "23" overriding the "5" in the class
In [48]: a.__dict__ # shows the instance storage:
Out[48]: {'a': 23}
In [49]: a.a
Out[49]: 23
In [50]: del a.a # deletes the value in the instance, leaving the class attribute untouched
In [51]: a.a
Out[51]: 5
If all of this fails, getattribute will raise an "AttributeError" inner exception - which is caught internally by Python machinery, and then it will try to call the __getattr__ function (not __getattribute__ ll the steps above are encoded inside __getattribute__, __getattr__ is called afterwards), and if that doesn't exist, or raise AttributeError, then the attribute is said not to exist.
In [55]: a = A()
In [56]: a.a
Out[56]: 5
In [57]: a.a = 23
In [58]: a.a
Out[58]: 23
In [59]: a.__dict__
Out[59]: {'a': 23}
In [60]: a.__dict__.clear()
In [61]: a.a
Out[61]: 5
And to show that __getattr__ is called after __getattribute__ we can use this snippet:
In [8]: class A:
...: def __getattribute__(self, attr):
...: print(f"entering getattribute")
...: try:
...: res = super().__getattribute__(attr)
...: except AttributeError:
...: print("attribute error inside getattribute")
...: raise
...: finally:
...: print("exiting getattribute")
...: return res
...: def __getattr__(self, attr):
...: print("getattr")
...: return attr
...:
In [9]: A().b
entering getattribute
attribute error inside getattribute
exiting getattribute
getattr
Out[9]: 'b'
It is important to note that metaclasses can't actually customize this behavior of calling __getattr__ after __getattribute__ runs all the steps above and raises itself: that is hardcoded in the language.
(Unlike the mechanism to create an instance which goes through the metaclass __call__ method which can be reimplemented to customize the steps of calling the class' __new__ and __init__.)
As for metaclasses, as described in 5, any modifications they make to attribute access is normally restricted to attributes in the classes themselves - the final instances of classes modified by a custom metaclass won't have attribute access easily modifiable by anything on the metaclass. However customising __getattr__, __getattribute__ or creating descriptors in the class itself, do modify attribute access directly. Of course, metaclasses could make indirect things, like modifying the mro itself (not trivial either) - and this could have attributes searched (or the mechanisms described above) in other classes than what one would expect due to normal inheritance order.