pythondockermicroservicesstreamingmonorepo

How can I efficiently share schemas and utils between micro services in a monorepo approach


Hi I am trying to build live prediction with the YOLO. The goal is to stream the data with some kind of transformation from the inference to a final frontend.

The flow should be like this:

  1. model training(separate process)
  2. Inference which saves the data in the Postgres
  3. ELT process which reads the data from the postgres, transforms it and saves it to the a new table in the postgres
  4. An API exposes the final table
  5. Frontend uses the API to show the data in a dashboard

The idea is to put everything into a micro service sturcture to have independent and scalable solutions. I know in bigger scale a architecture with Kafka and Spark would be the more efficient way, but for this project I want to put in a micro service architrcture.

My problem now is that I want to share some utils and also some schemas between the services. My idea is to use a base container which I am building so that I can use this container as base image for all the container which need the schemas. Due to the reason that everything should end up in one product I also want to have it in a monorepo.

I also know schema sharing for micro services is not best practice but for this usecase it would really help.

Here shortly a simplyfied idea for our structure(here with videos which get live calculated):

.
├── data/
│   ├── weights
│   ├── model_data
│   └── inference_tests
├── model_training/
│   ├── train.py
│   ├── prep.py
│   └── eval.py
├── services/
│   ├── shared/
│   │   ├── Dockerfile
│   │   ├── schemas/
│   │   │   ├── stats.py
│   │   │   └── raw_data.py
│   │   └── db_utils
│   ├── inference/
│   │   ├── Dockerfile
│   │   ├── pyproject.toml
│   │   ├── main.py
│   │   └── src/
│   │       └── all_stuff.py
│   ├── etl_process/
│   │   ├── Dockerfile
│   │   ├── pyproject.toml
│   │   ├── main.py
│   │   └── src/
│   │       └── all_stuff.py
│   ├── backend_for_frontend/
│   │   ├── Dockerfile
│   │   ├── pyproject.toml
│   │   ├── main.py
│   │   └── src/
│   │       └── all_stuff.py
│   └── frontend/
│       ├── Dockerfile
│       ├── pyproject.toml
│       ├── main.py
│       └── src/
│           └── all_stuff.py
└── docker-compose.yaml

In the end I want to combine everything with docker-compose like this:

version: "3.8"

services:
  # Base image for shared code
  shared-base:
    build:
      context: ./services/shared
      dockerfile: Dockerfile.base 
    image: shared-base-image

  db:
    image: postgres:13
    volumes:
      - db_data:.local/postgresql/data # .local is in the .gitignore
    environment:
      POSTGRES_USER: myuser
      POSTGRES_PASSWORD: mypassword
      POSTGRES_DB: mydb
    ports:
      - "5432:5432"

  inference:
    build:
      context: ./services/inference
      dockerfile: Dockerfile
    depends_on:
      - shared-base
      - db
    volumes:
      - ./data/video:/input_videos # not stream yet
    environment:
      DB_HOST: db
      DB_USER: myuser
      DB_PASSWORD: mypassword
      DB_NAME: mydb

  etl-process:
    build:
      context: ./services/etl-process
      dockerfile: Dockerfile
    depends_on:
      - shared-base
      - db
    environment:
      DB_HOST: db
      DB_USER: myuser
      DB_PASSWORD: mypassword
      DB_NAME: mydb

  backend:
    build:
      context: ./services/backend_for_frontend
      dockerfile: Dockerfile
    depends_on:
      - shared-base
      - db
    ports:
      - "8000:8000"
    environment:
      DB_HOST: db
      DB_USER: myuser
      DB_PASSWORD: mypassword
      DB_NAME: mydb

  frontend:
    build:
      context: ./services/frontend
      dockerfile: Dockerfile
    ports:
      - "3000:3000"

volumes:
  db_data:

To have the shared modules and schemas I want to use the use the base container I am building as base image for others container that need the shared schemas and utils.

Here how I want to implement it:

FROM python:3.9-slim-buster
WORKDIR /app
COPY . /shared

Next file that depends on it:

FROM shared-base-image

RUN pip install uv

COPY . .

ENRTRYPOINT["uv", "run", "main.py"]

Now my final question: What would be the final structure for this workflow and design? Are there some design patterns which are really helpful?

With this sturucture I also face the issues that I cannot easiyl run the scripst and modules without the container. Does it make sense to append a paths based on if the path exists?

I mean I could also have only one bif src folder but then all services would have the same dependencies which woul also be overhead.

Thx already for your help and I hope you have some inut to improve the structure.

I hope you can give me some idea how to structure it effectively. It is mainly about design and design pattern.


Solution

  • You should treat the shared library as an ordinary Python library. It does not need a Dockerfile, but it does need its own pyproject.toml. Then your other services can depend on it normally

    # services/inference/pyproject.toml
    [project]
    dependencies = [
      "../shared",
      ...
    ]
    

    This introduces the case where a Dockerfile needs to include content from outside its own directory. In the Compose file you need to change the build: { context: } to point to some parent directory, and change the dockerfile: to point back into the subdirectory

    services:
      inference:
        build:
          context: services
          dockerfile: inference/Dockerfile
    

    and also change the Dockerfile COPY statements to reference the subdirectory

    FROM python:3.13-slim
    # Install uv
    # https://docs.astral.sh/uv/guides/integration/docker/#installing-uv
    COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
    
    # Copy in the application and its libraries
    WORKDIR /app
    COPY shared/ shared/
    COPY inference/ inference/
    
    # Build it
    WORKDIR /app/inference
    RUN uv sync --frozen
    ENV PATH=/app/inference/.venv/bin:$PATH
    
    # Metadata to run it
    CMD ["inference"]
    

    In this setup there is not a "base Dockerfile". This pattern isn't supported well by Compose. Your shared Python code probably isn't large, and so long as the first several lines of the Dockerfile are the same across your various services, the underlying Docker image layers can be shared.


    I would also explore the merits of only using a single image. In your root directory you could have a pyproject.toml that depended on all of the subprojects, which would also bring in their Python entry point scripts. To the extent that you have large dependencies, this probably requires less disk space: a container shares space with its image, and you'll only have one copy of each dependency regardless of how many projects use them. Now a commit anywhere in your repo produces a single new image.

    There also may be some value in splitting this setup up into separate repositories. If you can use a Python package repository or upload your library to PyPI, then you can use a simpler Dockerfile. You also won't be forced to rebuild and restart your frontend because the ETL job changed. The downside, such as it is, is that it's harder to make cross-service breaking changes, but this hopefully is a rare event (and proper semantic versioning on your library can mitigate the issues somewhat).