I'm building my first Docker multi-staged project (to reduce size of the image) following this tutorial: https://pythonspeed.com/articles/multi-stage-docker-python/.
My dockerfile is pretty simple:
RUN apt-get update
RUN apt-get -y --no-install-recommends install \
python3 python3-pip python3-venv
RUN python3 -m venv /opt/fwr
ENV PATH="/opt/fwr/bin:$PATH"
COPY requirements.txt .
RUN pip install -r requirements.txt
FROM python:3-alpine3.18 AS build-image
WORKDIR /opt/fwr
COPY --from=compile-image /opt/fwr /opt/fwr
COPY *.py ./
ENV PATH="/opt/fwr/bin:$PATH"
CMD ["-u", "main.py"]
ENTRYPOINT ["python"]
All stages are going well, but once I try tu ryn container I got:
Traceback (most recent call last):
File "/opt/fwr/main.py", line 2, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
The question is: What did I do wrong? Thanks!
It looks like you're using a different base image for compiling than you are in the final image. Your Dockerfile as shown isn't valid -- it's missing the initial FROM
line -- but it looks like you're probably using an Ubuntu variant.
Ubuntu -- and most other Linux distributions -- are built around the glibc C library. Alpine, in order to reduce the size of the distribution, uses musl libc. When you build something under Ubuntu, it is very common for it to fail to run under Alpine because the two environments use different dynamic loaders.
If you use the same base image for compiling things that you use in your final image, you'll find that things build and run as expected:
FROM python:3-alpine3.18 AS compile-image
RUN apk add alpine-sdk gfortran
RUN python3 -m venv /opt/fwr
ENV PATH="/opt/fwr/bin:$PATH"
COPY requirements.txt .
RUN pip install -r requirements.txt
FROM python:3-alpine3.18 AS build-image
WORKDIR /opt/fwr
COPY --from=compile-image /opt/fwr /opt/fwr
COPY *.py ./
ENV PATH="/opt/fwr/bin:$PATH"
CMD ["-u", "main.py"]
ENTRYPOINT ["python"]
NB: Pandas doesn't provide binary wheels for Alpine, so everything needs to be built from source. That can take a long time. Because optimizing for size is often a wasted effort, you can substantially improve your build time if you just use the standard Python image instead:
FROM python:3
RUN python3 -m venv /opt/fwr
WORKDIR /opt/fwr
ENV PATH="/opt/fwr/bin:$PATH"
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY *.py ./
ENV PATH="/opt/fwr/bin:$PATH"
CMD ["-u", "main.py"]
ENTRYPOINT ["python"]
Since we're not actually compiling anything here, we can use a single stage image. Total build time is maybe a minute (probably faster if that alpine build wasn't still running in other terminal).
Update
The alpine build finally finished; the final image sizes are:
pytest-alpine 341 MB
pytest-debian 1.15 GB
You might think, "wow, the Debian-based image is so much bigger!", but in practice, because you will often have many images built from the same base, the real size impact is minimal.