python dockerfile python-venv python-install

Execute script that uses my local package - ImportErrors

My project was up-and-running for a while running in a kubernetes container... until, I decided to "clean-up" my use of the sys.add calls that I had at the top of my modules. This included describing my dependencies in pyproject.toml, and all-together ditching setup.py; it imported setup tools, called setup() when __main__.

The design intent is not to run anything in /tnc/app as a script. But rather, a collection of modules, or a package. The only part of the codebase that serves as a __main__ is the api.py file. It initializes and fires-up flask.

Implementation

I have a lean deployment setup that consists of the following:

the core library in /opt/venv
my package /app/tnc
and the entry point /app/bin/api

I kick-off the flask app with: python /app/bin/api.

The build takes place in the python:3.11-slim docker image. Here I install the recommended gcc and specify the following in the dockerfile:

-- build
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY pyproject.toml project.toml
RUN pip3 install -e .  -- << aside: better would be to use python -m pip3 install -e .

I then copy the following from the build into my runtime image.

-- runtime
ENV PATH "/opt/venv/bin:$PATH"
ENV PYTHONPATH "/opt/venv/bin:/app/tnc"
COPY --chown=appuser:appuser bin bin
COPY --chown=appuser:appuser tnc tnc
COPY --chown=appuser:appuser config.py config.py
COPY --from=builder /opt/venv/ /opt/venv

As I mentioned, in the kubernetes deployment I fire-up the container with:

command: ["python3"]
args: ["bin/api"]

My observations working to find the solution

Firing up the container in such a way that I can run the python REPL:

import flask generates AttributeError ...replace(' -> None', '')
remove /app/tnc from the PYTHONPATH, import flask generates ModuleNotFound ... no tnc

`AttributeError ...replace(' -> None', '')`

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/venv/lib/python3.10/site-packages/werkzeug/__init__.py", line 2, in <module>
    from .test import Client as Client
  File "/opt/venv/lib/python3.10/site-packages/werkzeug/test.py", line 35, in <module>
    from .sansio.multipart import Data
  File "/opt/venv/lib/python3.10/site-packages/werkzeug/sansio/multipart.py", line 19, in <module>
    class Preamble(Event):
  File "/usr/local/lib/python3.10/dataclasses.py", line 1175, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
  File "/usr/local/lib/python3.10/dataclasses.py", line 1093, in _process_class
    str(inspect.signature(cls)).replace(' -> None', ''))
AttributeError: module 'inspect' has no attribute 'signature'

`ModuleNotFoundError: No module named 'tnc'`

appuser@tnc-py-deployment-set-1:/app$ echo $PYTHONPATH
/opt/venv/bin
appuser@tnc-py-deployment-set-1:/app$ echo $PATH
/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
appuser@tnc-py-deployment-set-1:/app$ python -m /app/bin/api
/opt/venv/bin/python: No module named /app/bin/api
appuser@tnc-py-deployment-set-1:/app$ python /app/bin/api
Traceback (most recent call last):
  File "/app/bin/api", line 12, in <module>
    from tnc.s3 import S3Session
ModuleNotFoundError: No module named 'tnc'

The project structure

├── bin
│   └── api
├── config.py
├── pyproject.toml
└── tnc
    ├── __init__.py
    ├── data
    │   ├── __init__.py
    │   ├── download.py
    │   ├── field_types.py
    │   └── storage_providers
    ├── errors.py
    ├── inspect
    │   ├── __init__.py
    │   └── etl_time_index.py
    ├── test
    │   ├── __init__.py
    │   └── test_end-to-end.py
    ├── utils.py
    └── www
        ├── __init__.py
        └── routes
            ├── __init__.py
            ├── feedback.py
            ├── livez.py
            └── utils.py

pyproject.toml

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
where = ["./"]
exclude = [ "res", "notes" ]

dependencies = [ ... with version specs ]

Solution

First, I have to shout-out to the pyproject.toml + setuptools team: the documentation and implementation has gotten good. It allowed me to get a lot more specific and "deterministic" :)) about my setup. Not to mention, a bit more aggressive in the build process.

Fixing the "not found" errors

The fix included the following:

updated the pyproject.toml with the following

[tool.setuptools.package-dir]
tnc = "tnc"
bin = "bin"

# entry point (not required but is ergonomic)
[project.scripts]
run-api = "bin.api:main"

I included a __init__ to mark each submodule.

Perhaps not required, but I moved the config.py file into the bin directory. This location captured my design intent. Changes to the api.py file...

# instantiate the config object using a string ref to the config.py
app.config.from_object("bin.config.DevelopmentConfig")
...

# added a def main() to enable the option of specifying an entry point
if __name__ == '__main__':
    logging.basicConfig(level=logging.DEBUG)
    app.run(host=app.config['HOST'], port=app.config['PORT'])

def main():
    """ if using entrypoint script """
    logging.basicConfig(level=logging.DEBUG)
    app.run(host=app.config['HOST'], port=app.config['PORT'])

In the Dockerfile I set the PYTHONPATH env value to "/app", the location of the tnc and bin directories. By no means a best practice, but in this case, given my determination to have bin separate from tnc, the only way that made sense. This use case seemed the right way to go.

Improved build process

Finally, while there are a few well known techniques to maximize the reuse of the cache when building the docker image, I wanted to call out how easy it was to know precisely what was going on during the build, made possible by the latest setuptool configured with pyproject.toml.

A. It was trivial to first run the build using empty stub for where the app code would eventually go.

# pyproject.dependencies.toml
packages = ["tnc"]

... paired with the 2 phased build (the image is an official docker python image)

# Make sure to use the venv from the python base img:
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# phase 1: dependency build using an empty project dir
COPY pyproject.dependencies.toml pyproject.toml
RUN mkdir tnc
RUN pip3 install .

# phase 2: full and final build
COPY bin bin
COPY tnc tnc
COPY pyproject.toml pyproject.toml
RUN pip3 install .

B. It was clear what to copy from the now consolidated build artifacts, into my image used for distribution

COPY --from=builder --chown=appuser:appuser /app/build/lib/tnc tnc
COPY --from=builder --chown=appuser:appuser /app/build/lib/bin bin
COPY --from=builder --chown=root:root /opt/venv/ /opt/venv

In the kube deployment, despite being able call the entry point configured using pyproject.toml, I chose to call the api.py as a script.

# in the kube deployment for the image
command: ["python"]
args: ["/app/bin/api.py"]

Conclusion

I have an improved design that no longer includes "ad-hoc" calls to sys.path, nor resorts to "polluting" the PYTHONPATH. The single entry I now have, /app`, conveys an important design choice: wanting to have the entry point be in a separate root directory.