pythongetattr

Python __getattr__ executed multiple times


I've been trying to implement the __getattr__ function as in the following example:

PEP 562 -- Module __getattr__ and __dir__

And I don't get why this simple piece of code:

# lib.py

def __getattr__(name):
    print(name)

# main.py

from lib import test

outputs:

__path__
test
test

What is __path__ ? Why is it sent to __getattr__ ? Why is test sent 2 times ?


Solution

  • TL;DR the first "test" printed is a side-effect of the "from import" implementation, i.e. it's printed during creation of lib module. The second "test" is from subsequent access of dynamic attribute on the module directly.

    Considering that importlib is implemented in Python code, modify your lib.py slightly to also dump a trace:

    # lib.py
    from traceback import print_stack
    
    def __getattr__(name):
        print_stack()
        print(name)
        print("-" * 80)
    

    Then add a simple main script within the same directory as lib.py:

    # main.py
    
    from lib import test
    

    Executing the script will pinpoint the library location within importlib which triggers double attribute access:

    $ python3 main.py 
      File "main.py", line 3, in <module>
        from lib import test
      File "<frozen importlib._bootstrap>", line 1019, in _handle_fromlist
      File "lib.py", line 5, in __getattr__
        print_stack()
    __path__
    --------------------------------------------------------------------------------
      File "main.py", line 3, in <module>
        from lib import test
      File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
      File "lib.py", line 5, in __getattr__
        print_stack()
    test
    --------------------------------------------------------------------------------
      File "main.py", line 3, in <module>
        from lib import test
      File "lib.py", line 5, in __getattr__
        print_stack()
    test
    --------------------------------------------------------------------------------
    

    Now we can find the answer by RTFS - below I use Python v3.7.6, switch on git to the exact tag you use in case of different version. Looking in importlib._bootstrap. _handle_fromlist at the indicated line numbers, the __path__ access comes from importlib/_bootstrap.py:L1019 (note: this moved in 3.8, see #5873).

    _handle_fromlist is a helper intended to load package submodules in a from import. Step 1 is to see if the module is a package at all. A module is considered a package if it has a __path__ attribute:

    if hasattr(module, '__path__'):
    

    Because your __getattr__ returns None for all inputs, hasattr returns True here, so your module looks like a package. If hasattr had returned False, _handle_fromlist would abort at this point.

    The "fromlist" here will contain the name(s) requested by the import statement, ["test"] in our case, so we go into the for-loop with x="test" and on line 1032 there is the "extra" invocation:

    elif not hasattr(module, x):
    

    from lib import test will only attempt to load a lib.test submodule if lib does not already have a test attribute. This check is testing whether the attribute exists, to see if _handle_fromlist needs to attempt to load a submodule.

    Should you return different values for the first and second invocation of __getattr__ with name "test", then the second value returned is the one which will actually be received within main.py.