pythonpackagesoftware-distribution

Best practice for packaging modules with sub-modules - most using the same libraries


I am currently packaging my own module for distribution. In general everything is working fine, but fine-tuning/best-practice for structuring sub-modules is giving me some trouble.

Assuming a module structure of:

mdl
├── mdl
│   ├── __init__.py
│   ├── core.py
|   ├── sub_one
|   |   ├── __init__.py
|   |   └── core_sub_one.py
|   └── sub_two
|       ├── __init__.py
|       └── core_sub_two.py
├── README
└── setup.py

core file headers

With the header of core.py starting with:

import numpy as np

...some fairly large module code...

And the headers of both core_sub_one.py and core_sub_two.py starting with:

import numpy as np

from .. import core as cr

So all submodules require np and cr.


init.py structure

The mdl/__init__.py (core-level) looks like:

from . import sub_one as so
from . import sub_two as st

And __init__.py of both submodules looks like (replace one with two for the other submodule):

from . import core_sub_one
from .core_sub_one import *

I've "learnt" this structure from numpy, see f.i. numpy/ma/__init__.py


Problem description

Now I've got some trouble with the submodule-access after running setup.py and importing my module with import mdl.
I can now access my submodules with f.i. mdl.so.some_function_in_sub_one(). This is expected and what I want.

But I can also access the top level module cr and numpy with mdl.so.cr and mdl.so.np, which I want to avoid. Is there any way to avoid this? If not: Is there any drawback of importing/connecting modules and submodules like this?

And is there any best practice for how to import libraries like numpy in sub-modules, when they are required in all submodules?

Edit:
Since some seem to have trouble with the fact that asking for best practice is opinion based (which I know and which I intended, since imho most design decisions in real life are not clear binary 1-0 decisions), I have to add:
I want to comply with the module packaging style used in the scipy, and more specifically numpy, package environment. So if these packages found a solution for any of the questions I asked, this will be the most welcome solution for me.


Solution

  • First thing first:

    from .core_sub_one import *

    DONT DO THIS. Yes, even if you seen it in some "big name" package, read it in some tutorials or whatever. This is officially considered bad practice, and for good reasons (from experience, it's a maintaince hell).

    If you really really insist on doing this (but seriously, don't), at least define an explicit __all__ var in those modules so you keep exposed names under control (and it helps documenting what's supposed to be part of the module's API).

    But I can also access the top level module cr and numpy with mdl.so.cr and mdl.so.np, which I want to avoid. Is there any way to avoid this?

    Not really. If you're really worried about it, you can import those names as "protected" in your submodules:

    # core_sub_xxx.py
    
    import numpy as _np
    from .. import core as _cr
    

    (of course you'll have to replace all occurrences of 'np' and 'cr' but any half-decent text editor can do this)

    This doesn't prevent access to mysubmodule._cr or mysubmodule._np but at least it makes it clear that one should NOT access those names.

    But really, this is not a big issue, as long as your API is clearly documented.