djangodockerdocker-composedockerfiledigital-ocean

Docker Image > 1GB in size from python:3.8.3-alpine


I'm pretty new to docker and, although I've read lots of articles, tutorials and watched YouTube videos, I'm still finding that my image size is in excess of 1 GB when the alpine image for Python is only about 25 MB (if I'm reading this correctly!).

I'm trying to work out how to make it smaller (if in fact it needs to be).

[Note: I've been following tutorials to create what I have below. Most of it makes sense .. but some of it feels like voodoo]

Here is my Dockerfile:

FROM python:3.8.3-alpine

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

RUN mkdir -p /home/app

RUN addgroup -S app && adduser -S app -G app

ENV HOME=/home/app
ENV APP_HOME=/home/app/web
RUN mkdir $APP_HOME
RUN mkdir $APP_HOME/staticfiles
RUN mkdir $APP_HOME/mediafiles
WORKDIR $APP_HOME

RUN pip install --upgrade pip

COPY requirements.txt .

RUN apk update \
    && apk add --virtual build-deps gcc python3-dev musl-dev \
    && apk add postgresql-dev \
    && apk add jpeg-dev zlib-dev libjpeg \
    && apk add --update --no-cache postgresql-client

RUN pip install -r requirements.txt

RUN apk del build-deps

COPY entrypoint.prod.sh $APP_HOME

COPY . $APP_HOME

RUN chown -R app:app $APP_HOME

USER app

ENTRYPOINT ["/home/app/web/entrypoint.prod.sh"]

Using Pillow and psycopg2-binary has caused a world of confusion and hurt. Particularly with the following:

RUN apk update \
    && apk add --virtual build-deps gcc python3-dev musl-dev \
    && apk add postgresql-dev \
    && apk add jpeg-dev zlib-dev libjpeg \
    && apk add --update --no-cache postgresql-client

RUN pip install -r requirements.txt

RUN apk del build-deps

This was originally:

RUN apk update \
    && apk add --virtual build-deps gcc python3-dev musl-dev \
    && apk add postgresql \
    && apk add postgresql-dev \
    && apk add --update --no-cache postgresql-client \
    && pip install psycopg2-binary \
    && apk add jpeg-dev zlib-dev libjpeg \
    && pip install Pillow \
    && apk del build-deps

I really have no idea how much of the above I need to make it work. I think there might be a way of reducing the build.

I know there is a way to build the original image and then use that to transfer things over, but the only tutorials are confusing and I am struggling to get my head around this without adding more complexity. I really wish I had someone who could just explain it in person.

I also don't know if the size of the image is coming from the requirements.txt file. I'm using django and there are a number of requirements:

requirements.txt

asgiref==3.4.1
Babel==2.9.1
boto3==1.18.12
botocore==1.21.12
certifi==2021.5.30
charset-normalizer==2.0.4
crispy-bootstrap5==0.4
defusedxml==0.7.1
diff-match-patch==20200713
Django==3.2.5
django-anymail==8.4
django-compat==1.0.15
django-crispy-forms==1.12.0
django-environ==0.4.5
django-extensions==3.1.3
django-hijack==2.3.0
django-hijack-admin==2.1.10
django-import-export==2.5.0
django-money==2.0.1
django-recaptcha==2.0.6
django-social-share==2.2.1
django-storages==1.11.1
et-xmlfile==1.1.0
fontawesomefree==5.15.3
gunicorn==20.1.0
idna==3.2
jmespath==0.10.0
MarkupPy==1.14
odfpy==1.4.1
openpyxl==3.0.7
Pillow==8.3.1
psycopg2-binary==2.9.1
py-moneyed==1.2
python-dateutil==2.8.2
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
s3transfer==0.5.0
six==1.16.0
sqlparse==0.4.1
stripe==2.60.0
tablib==3.0.0
urllib3==1.26.6
xlrd==2.0.1
xlwt==1.3.0

The question I have is, how do I make the image smaller. Does it need to be smaller?

I'm just trying to find the best way to deploy the Django app to Digitalocean and there is a world of confusion with so many approaches and tutorials etc. I don't know if it makes it easier to use docker. Do I just use their App Platform? Will that provide SSL? What are the advantages to using docker etc?

docker-compose file (for reference)

version: '3.7'

services:
  web:
    build:
      context: .
      dockerfile: Dockerfile.prod
    command: gunicorn maffsguru.wsgi:application --bind 0.0.0.0:8000
    volumes:
      - static_volume:/home/app/web/staticfiles
      - media_volume:/home/app/web/mediafiles
    expose:
      - 8000
    env_file:
      - .env.docker
    depends_on:
      - db
  db:
    image: postgres:12.0-alpine
    env_file:
      - .env.docker
    volumes:
      - postgres_data:/var/lib/postgresql/data/
    ports:
      - 5432:5432
  nginx:
    build: ./nginx
    volumes:
      - static_volume:/home/app/web/staticfiles
      - media_volume:/home/app/web/mediafiles
    ports:
      - 1337:80
    depends_on:
      - web

volumes:
  postgres_data:
  static_volume:
  media_volume:

Just to say ... the above all seems to work ... but I don't know if the size of the image etc is going to be a problem?

I am also confused as to why Nginx seems to need me to do http://0.0.0.0:1337 to view the site. Isn't the whole point to view it by navigating to http://0.0.0.0/

Thanks for any advice or guidance you might be able to give and apologies for the random nature of my questions


Solution

  • welcome to Docker! It can be quite the thing to wrap one's head around, especially when beginning, but you're asking really valid questions that are all pertinent

    Reducing Size

    How to

    A great place to start is Docker's own Dockerfile best practices page:

    https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

    They explain neatly how your each directve (COPY, RUN, ENV, etc) all create additional layers, increasing your containers size. Importantly, they show how to reduce your image size by minimising the different directives. They key to alot of minimisation is chaining commands in RUN statements with the use of &&.

    Something else I note in your Dockerfile is one specific line:

    COPY . $APP_HOME
    

    Now, depending on how you build your container (Specifically, what folder you pass to Docker as the context), this will copy EVERYTHING in that it has available to it. Chances are, this will be bringing in your venv folder etc if you have one. I feel that this may be the largest perpetrator of size for you. You can mitigate this by adding an explicit COPY in, or using a .dockerignore file.

    I built your image (Without any source code, and without copying in entrypoint.sh), and it came out to 710MB as a base. It could be a good idea to check the size of your source code, and see if anything else is getting in there. After I re-arranged some of the commands to reuse directives, the image was 484MB, which is considerably smaller! If you get stuck, I can pop it into a gist on Github for you and walk you through it, however, the Docker documentation should hopefully get you going

    Why?

    Well, larger applications / images aren't inherently bad, but with any increase in data, some operations may be slower.

    When I say operations, I tend to mean pulling images from a registry, or pushing them to publish. It will take longer to transfer 1GB than it will 50MB.

    There's also a consideration to be made when you scale your containers. While the image size does not necessarily correlate directly to how much disk you will use when you start a container, it will certainly increase the requirements for the machine you're running on, and limit others on smaller devices

    Docker

    The advantages of using Docker are widespread, and I can't cover them all here without submitting my writing for thesis defence ;-)

    But it mainly boils down to the following points:

    Nginx

    You've set things up well there, from what I can gather! I imagine nginx is 'telling you' (Via the logs?) to navigate to 0.0.0.0 because that is what it will have bound to in the container. Now, you've forwarded traffic from 1337:80. Docker follows the format of host:container, so this means that traffic on localhost:1337 will be directed to the containers port 80. You may need to swap this around based on your nginx configuration, but rest assured you will be able to navigate to localhost in your browser and see your website once everything is set up

    Let me know if you need help with any of the above, or want more resources to aid you. Happy to correspond and walk you through anything anytime given we seem to be in the same timezone 🤙