angularjsjenkinsdockerranchercontainer-image

Best practice for build / deployment of docker images


I've just finished the basic pipeline for my angular application, which runs in a Node image in docker. So, the process works as follows: push to Gitlab > Hook to Jenkins Build > Deployment script to docker build image and push to Quay > Publish script to prompt Rancher service to upgrade the container and refresh the image > Complete.

Now, the problem I have is that the base node image is quite large, meaning when I am pushing a simple change, it takes a long while to complete the build pipeline (~8 minutes). This seems unreasonable for every tiny change, and the push to Quay and subsequent publish to the Rancher platform means I'm moving around 250mb up to quay and 250mb to Rancher.

I have several "micro-services" planned for deployment, but if each time I want to deploy one to a development environment and move that much data around each time, it seems somewhat counter productive... Is what I am doing wrong, what am I missing, and is there any guidelines for best practice when building/deploying/hosting container based services?


Solution

  • First some info on images, builds, registries and clients.

    Images and Layers

    Docker image builds work with layers. Each step in your Dockerfile commits a layer that is overlaid on top of the previous.

    FROM node                   ---- a6b9ffdcf522
    RUN apt-get update -y        --- 72886b467bd2
    RUN git clone whatever        -- 430615b3487a
    RUN npm install                - 4f8ddac8d3b5 mynode:latest
    

    Each layer that makes up an image is individually identified by a sha256 checksum. The IMAGE ID in docker images -a is a short snippet of that.

    Running dockviz images -t on a build host will give you a better idea of the tree of layers that can build up. While a build is running you can see a branch grow, and then the final layer is eventually tagged but that layer stays in the tree and maintains a link to it's parents.

    Build Caching

    Docker builds are cached by default at each build step. If the RUN command in the docker file hasn't changed or COPY source files your are copying haven't changed then that build step should not need to run again. The layer stays the same, as does the sha256 checksum ID and docker attempts to build the next layer.

    When docker gets to a step that does need to be rebuilt, the image "tree" that dockviz presents will branch off to create the new layer with a new checksum. Any steps after this then need to run again and create a layer on the new branch.

    Registries

    Registries understand this layering too. If you only change the top most layer in your newly tagged image, that's the only layer that should need to be uploaded to the registry (There are caveats to this, it works best with a recent docker-1.10.1+ and registry 2.3+) The registry will already have a copy of most of the image id's that make up your new "image" and only new layers will need to be sent.

    Clients

    Docker registry clients deal with layers in the same way. When pulling an image it actually downloads the individual layers (blobs) that make up the image. You can see this from the list of image ids printed when you docker pull or docker run a new image. Again, if most of the layers are the same then the update will only need download those top most layers that have changed saving precious time.

    Minimising build time

    So the things you want to focus on are

    Keep the image sizes small

    The main way to save time is to not have anything to do in the first place. The less data in an image the better.

    If you can avoid using a full OS, do it. When you can run apps on the busybox or alpine image, it makes the Docker gods smile. alpine + a node.js build is less than 50MB. Go binaries are a great example of minimising size too. They can be statically compiled, and have no dependencies so can even be run on the blank scratch image.

    Take advantage of Dockers build caching

    It's important to have your most frequently changing artefacts (most likely your code) as a late entry in your Dockerfile. The build will slow down if the build has to update the complete 50MB of data for one little file change that invalidates the cache for a build step.

    There will always be some changes that invalidate the entire cache (like updating the base node image). These you just have to live with once in a while.

    Anything else in the build that infrequently updates, should go to the top of the Dockerfile.

    Make use of common "tagged" parent images

    Although image checksumming has been somewhat fixed from Docker 1.10 onwards, using a common parent image guarantees that you will be starting from the same shared image ID where ever you use that image with FROM.

    Previous to Docker 1.10, Image ID's were just a random uuid. If you had builds running on multiple hosts, layers could all be invalidated and replaced depending on which host built them. Even if the layers were in fact the same thing.

    Common parent images also help when you have multiple services and multiple Dockerfiles that are largely the same. Whenever you start repeating build steps in multiple Dockerfiles, pull those steps out into a common parent image so layers are definitely shared between all your services. You are essentially getting this already by using the node image as your base.

    Node.js tricks

    If you are running an npm install after your code deploy every build and you have a number of dependencies, the npm install causes a lot of repeated work that doesn't actually change much each build. It could be worthwhile to have a workflow to build your node_modules prior to code changes. Then the npm install only needs to run you when package.json is updated

    FROM node
    WORKDIR /app
    COPY package.json /app/package.json
    RUN npm install && rm -rf ~/.npm
    COPY . /app/
    CMD [ "node", "/app/server.js" ]
    

    Staged build

    If you rely on npm packages with native modules, you will sometimes need to install a complete build chain in the container to run the npm install. Staged builds can now easily separate the build image from the run image.

    FROM node:8 AS build
    WORKDIR /build
    RUN apt-get update \
     && apt-get install build-essential;
    COPY package.json /build/package.json
    RUN npm install; \
     && rm -rf ~/.npm;
    
    # Stage 2 app image
    FROM node:8-slim
    WORKDIR /app
    COPY --from=build /build/node_modules /app/node_modules
    COPY . /app/ 
    CMD [ "node", "/app/server.js" ]
    

    Other things

    Make sure your build host has ssd's and a good internet connection, because there will be times you have to do a full rebuild so the quicker it is, the better. AWS usually works well because the packages and images you are pulling and pushing are probably hosted on AWS as well. AWS also provide a image registry services (ECR) for only the cost of storage.