pythondockerfilepython-venvpython-install

Execute script that uses my local package - ImportErrors


My project was up-and-running for a while running in a kubernetes container... until, I decided to "clean-up" my use of the sys.add calls that I had at the top of my modules. This included describing my dependencies in pyproject.toml, and all-together ditching setup.py; it imported setup tools, called setup() when __main__.

The design intent is not to run anything in /tnc/app as a script. But rather, a collection of modules, or a package. The only part of the codebase that serves as a __main__ is the api.py file. It initializes and fires-up flask.

Implementation

I have a lean deployment setup that consists of the following:

  1. the core library in /opt/venv
  2. my package /app/tnc
  3. and the entry point /app/bin/api

I kick-off the flask app with: python /app/bin/api.

The build takes place in the python:3.11-slim docker image. Here I install the recommended gcc and specify the following in the dockerfile:

-- build
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY pyproject.toml project.toml
RUN pip3 install -e .  -- << aside: better would be to use python -m pip3 install -e .

I then copy the following from the build into my runtime image.

-- runtime
ENV PATH "/opt/venv/bin:$PATH"
ENV PYTHONPATH "/opt/venv/bin:/app/tnc"
COPY --chown=appuser:appuser bin bin
COPY --chown=appuser:appuser tnc tnc
COPY --chown=appuser:appuser config.py config.py
COPY --from=builder /opt/venv/ /opt/venv

As I mentioned, in the kubernetes deployment I fire-up the container with:

command: ["python3"]
args: ["bin/api"]

My observations working to find the solution

Firing up the container in such a way that I can run the python REPL:

AttributeError ...replace(' -> None', '')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/venv/lib/python3.10/site-packages/werkzeug/__init__.py", line 2, in <module>
    from .test import Client as Client
  File "/opt/venv/lib/python3.10/site-packages/werkzeug/test.py", line 35, in <module>
    from .sansio.multipart import Data
  File "/opt/venv/lib/python3.10/site-packages/werkzeug/sansio/multipart.py", line 19, in <module>
    class Preamble(Event):
  File "/usr/local/lib/python3.10/dataclasses.py", line 1175, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
  File "/usr/local/lib/python3.10/dataclasses.py", line 1093, in _process_class
    str(inspect.signature(cls)).replace(' -> None', ''))
AttributeError: module 'inspect' has no attribute 'signature'

ModuleNotFoundError: No module named 'tnc'

appuser@tnc-py-deployment-set-1:/app$ echo $PYTHONPATH
/opt/venv/bin
appuser@tnc-py-deployment-set-1:/app$ echo $PATH
/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
appuser@tnc-py-deployment-set-1:/app$ python -m /app/bin/api
/opt/venv/bin/python: No module named /app/bin/api
appuser@tnc-py-deployment-set-1:/app$ python /app/bin/api
Traceback (most recent call last):
  File "/app/bin/api", line 12, in <module>
    from tnc.s3 import S3Session
ModuleNotFoundError: No module named 'tnc'

The project structure

├── bin
│   └── api
├── config.py
├── pyproject.toml
└── tnc
    ├── __init__.py
    ├── data
    │   ├── __init__.py
    │   ├── download.py
    │   ├── field_types.py
    │   └── storage_providers
    ├── errors.py
    ├── inspect
    │   ├── __init__.py
    │   └── etl_time_index.py
    ├── test
    │   ├── __init__.py
    │   └── test_end-to-end.py
    ├── utils.py
    └── www
        ├── __init__.py
        └── routes
            ├── __init__.py
            ├── feedback.py
            ├── livez.py
            └── utils.py

pyproject.toml

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
where = ["./"]
exclude = [ "res", "notes" ]

dependencies = [ ... with version specs ]

Solution

  • First, I have to shout-out to the pyproject.toml + setuptools team: the documentation and implementation has gotten good. It allowed me to get a lot more specific and "deterministic" :)) about my setup. Not to mention, a bit more aggressive in the build process.

    Fixing the "not found" errors

    The fix included the following:

    1. updated the pyproject.toml with the following
    [tool.setuptools.package-dir]
    tnc = "tnc"
    bin = "bin"
    
    # entry point (not required but is ergonomic)
    [project.scripts]
    run-api = "bin.api:main"
    

    I included a __init__ to mark each submodule.

    1. Perhaps not required, but I moved the config.py file into the bin directory. This location captured my design intent. Changes to the api.py file...
    # instantiate the config object using a string ref to the config.py
    app.config.from_object("bin.config.DevelopmentConfig")
    ...
    
    # added a def main() to enable the option of specifying an entry point
    if __name__ == '__main__':
        logging.basicConfig(level=logging.DEBUG)
        app.run(host=app.config['HOST'], port=app.config['PORT'])
    
    def main():
        """ if using entrypoint script """
        logging.basicConfig(level=logging.DEBUG)
        app.run(host=app.config['HOST'], port=app.config['PORT'])
    
    1. In the Dockerfile I set the PYTHONPATH env value to "/app", the location of the tnc and bin directories. By no means a best practice, but in this case, given my determination to have bin separate from tnc, the only way that made sense. This use case seemed the right way to go.

    Improved build process

    Finally, while there are a few well known techniques to maximize the reuse of the cache when building the docker image, I wanted to call out how easy it was to know precisely what was going on during the build, made possible by the latest setuptool configured with pyproject.toml.

    A. It was trivial to first run the build using empty stub for where the app code would eventually go.

    # pyproject.dependencies.toml
    packages = ["tnc"]
    

    ... paired with the 2 phased build (the image is an official docker python image)

    # Make sure to use the venv from the python base img:
    RUN python -m venv /opt/venv
    ENV PATH="/opt/venv/bin:$PATH"
    
    # phase 1: dependency build using an empty project dir
    COPY pyproject.dependencies.toml pyproject.toml
    RUN mkdir tnc
    RUN pip3 install .
    
    # phase 2: full and final build
    COPY bin bin
    COPY tnc tnc
    COPY pyproject.toml pyproject.toml
    RUN pip3 install .
    

    B. It was clear what to copy from the now consolidated build artifacts, into my image used for distribution

    COPY --from=builder --chown=appuser:appuser /app/build/lib/tnc tnc
    COPY --from=builder --chown=appuser:appuser /app/build/lib/bin bin
    COPY --from=builder --chown=root:root /opt/venv/ /opt/venv
    

    In the kube deployment, despite being able call the entry point configured using pyproject.toml, I chose to call the api.py as a script.

    # in the kube deployment for the image
    command: ["python"]
    args: ["/app/bin/api.py"]
    

    Conclusion

    I have an improved design that no longer includes "ad-hoc" calls to sys.path, nor resorts to "polluting" the PYTHONPATH. The single entry I now have, /app`, conveys an important design choice: wanting to have the entry point be in a separate root directory.