dockercontinuous-integrationpython-manylinux

CI/CD: How to run the initial setup steps (yum upgrade, deps installation etc). Shoud I use a custom Docker image or not?


I have just started experimenting with CI/CD. I want to create a CI/CD pipeline for my project which builds/tests my application on Linux, MacOS, and Windows. For the Linux part, I need to use a specific Docker container (quay.io/pypa/manylinux2010_x86_64:latest). Before starting the build in the container I do the usual setup (e.g., yum -y upgrade, install CMake, etc). And this is where I am starting to get confused. To my understanding, and after spending sometime Googling, the two most common ways to do that are the following:

1) Build a new Docker container which is based on quay.io/pypa/manylinux2010_x86_64:latest but also comes with other dependencies installed. An example Dockerfile would be the following:

FROM quay.io/pypa/manylinux2010_x86_64:latest

RUN yum -y upgrade \
    yum clean all \
    rm -rf /var/cache/yum \
    git clone https://github.com/Kitware/CMake.git \
    cd CMake \
    git checkout -b build v3.15.3 \
    ./configure \
    make \
    make install \
    cd .. \
    rm -r CMake

This container is built once and stored in a repository. Then, every time the CI/CD pipeline runs, it fetches and uses this container.

2) Use the quay.io/pypa/manylinux2010_x86_64:latest image in the CI/CD pipeline directly and make the yum -y upgrade and CMake installation commands part of the CI/CD pipeline scripts. This means that every time the CI/CD pipeline runs, it: (a) fetches the docker image, (b) starts the container, (c) runs yum and installs the dependencies.

FYI: I mostly am interested in the GitLab and GitHub Actions CI/CD services.


Solution

  • I would defenitely opt for the 1.)

    Option 2.) has in my opinion the disadvanteges, that it

    1. takes more CPU (you repeatedly do stuff), more Time, more Bandwith,
    2. is more error prone/unreliable (every little network outage of the distributions package repo would lead to a broken build, or every other problem on this level. And you want to keep your CI/CD piplinne es focused as possible to the build of the software itself. Maybe thats the most important part. If a build fails a big red light should go up in your office. And you do not want this to happen for silly network errors, but only for important code errors!
    3. can be more undeterministic, so you might geht a different version of one of the installed packages each time...
    4. And in theory, due to security reasons it also makes sense doing a "master" that is verified and secure and use that. Every install (if not verified) might introduce security vulnerabilites...

    If you do professional software dev, as you do, you require a Docker/Container Repository anyway to upload your build artifacts, normally packaged as a container. So I would build a "golden plate" "build container" and put that into your repository and use this as base for your builds in your CI/CD pipeline. But if this option is too difficult then start with option 2).