In PEP 366 - Main module explicit relative imports which introduced the module-scope variable __package__
to allow explicit relative imports in submodules, there is the following excerpt:
When the main module is specified by its filename, then the
__package__
attribute will be set toNone
. To allow relative imports when the module is executed directly, boilerplate similar to the following would be needed before the first relative import statement:if __name__ == "__main__" and __package__ is None: __package__ = "expected.package.name"
Note that this boilerplate is sufficient only if the top level package is already accessible via
sys.path
. Additional code that manipulatessys.path
would be needed in order for direct execution to work without the top level package already being importable.This approach also has the same disadvantage as the use of absolute imports of sibling modules - if the script is moved to a different package or subpackage, the boilerplate will need to be updated manually. It has the advantage that this change need only be made once per file, regardless of the number of relative imports.
I have tried to use this boilerplate in the following setting:
Directory layout:
foo
├── bar.py
└── baz.py
Contents of the bar.py submodule:
if __name__ == "__main__" and __package__ is None:
__package__ = "foo"
from . import baz
The boilerplate works when executing the submodule bar.py from the file system (the PYTHONPATH
modification makes the package foo/ accessible on sys.path
):
PYTHONPATH=$(pwd) python3 foo/bar.py
The boilerplate also works when executing the submodule bar.py from the module namespace:
python3 -m foo.bar
However the following alternative boilerplate works just as well in both cases as the contents of the bar.py submodule:
if __package__:
from . import baz
else:
import baz
Furthermore this alternative boilerplate is simpler and does not require any update of the submodule bar.py when it is moved with the submodule baz.py to a different package (since it does not hard code the package name "foo"
).
So here are my questions about the boilerplate of PEP 366:
__name__ == "__main__"
necessary or is it already implied by the second subexpression __package__ is None
?__package__ is None
be not __package__
instead, in order to handle the case where __package__
is the empty string (like in a __main__.py
submodule executed from the file system by supplying the containing directory: PYTHONPATH=$(pwd) python3 foo/
)?The correct boilerplate is none, just write the explicit relative import and let the exception escape if someone tries to run the module as a script or has sys.path
misconfigured:
from . import baz
The boilerplate given in PEP 366 is just there to show that the proposed change is sufficient to allow users to make direct execution* work if they really want to, it isn’t intended to suggest that making direct execution work is a good idea (it isn’t, it is a bad idea that will almost inevitably cause other problems, even with the boilerplate from the PEP).
Your proposed alternative boilerplate recreates the problem caused by implicit relative imports in Python 2: the "baz"
module gets imported as baz
from __main__
, but will be imported as "foo.baz"
everywhere else, so you end up with two copies in sys.modules
under different names.
Amongst other problems, this means that if some other module throws foo.baz.SomeException
and your __main__
module tries to catch baz.SomeException
, it won’t work, as those will be two different exception objects coming from two different modules.
By contrast, if you use the PEP boilerplate, then __main__
will correctly import baz
as "foo.baz"
, and the only thing you have to worry about is other modules potentially importing foo.bar
.
If you want simpler boilerplate that explicitly guards against the "inadvertently making two copies of the same module under a different name" bug without hardcoding the package name, then you can use this:
if not __package__:
raise RuntimeError(f"{__file__} must be imported as a package submodule")
However, if you are going to do that, you can just as well do from . import baz
unconditionally as suggested above, and let the underlying exception escape if someone tries to run the script directly instead of via the -m
switch.
* Direct execution means executing code from:
python <file path>
).-c
argument (python -c <code>
).python
).python < <file path>
).Indirect execution means executing code from:
python <directory or zip file path>
).-m
argument (python -m <module name>
).import <module name>
)Now to answer your questions specifically:
- Is the first subexpression
__name__ == "__main__"
necessary or is it already implied by the second subexpression__package__ is None
?
It is hard to get __package__ is None
anywhere other than the __main__
module with the modern import system. But it used to be a lot more common, as rather than being set by the import system on module load, __package__
would instead be set lazily by the first explicit relative import executed in the module. In other words, the boilerplate is only trying to let direct execution work (cases 1 to 4 above) but __package__ is None
used to imply direct execution or an import statement (case 7 above), so to filter out case 7 the subexpression __name__ == "__main__"
(cases 1 to 6 above) was necessary.
- Shouldn’t the second subexpression
__package__ is None
benot __package__
instead, in order to handle the case where__package__
is the empty string (like in a__main__.py
submodule executed from the file system by supplying the containing directory:PYTHONPATH=$(pwd) python3 foo/
)?
No because the boilerplate is only trying to let direct execution work (cases 1 to 4 above), it isn’t trying to let other flavours of sys.path
misconfiguration pass silently.