docker dockerfile containers docker-registry

Dockerfile RUN layers vs script

Docker version 19.03.12, build 48a66213fe

So in a dockerfile, if I have the following lines:

RUN yum install aaa \
        bbb \
        ccc && \
        <some cmd> && \
        <etc> && \
         <some cleanup>

is that a best practice? Should I keep yum part separate than when I call other <commands/scripts>?

If I want a cleaner (vs traceable) Dockerfile, what if I put those lines in a .sh script can just call that script (i.e. COPY followed by a RUN statement). Will the build step run each time, even though nothing is changes inside .sh script**?** Looking for some gotchas here.

I'm thinking, whatever packages are stable, have a separate RUN <those packages> i.e. in one layer and lines which depend upon / change frequently i.e. may use user-defined (docker build time CLI level args) keep those in separate RUN layer (so I can use layer cache effectively).

Wondering if you think keeping a cleaner Dockerfile (calling RUN some.sh) would be less efficient than a traceable Dockerfile (where everything is listed in Dockerfile what makes that image).

Thanks.

Solution

In terms of the final image filesystem, you will notice no difference if you RUN the commands directly, or RUN a script, or have multiple RUN commands. The number of layers and the size of the command string doesn't really make any difference at all.

What can you observe?

Particularly on the "classic" Docker build system, each RUN command becomes an image layer. In your example, you RUN yum install && ... && <some cleanup>; if this was split into multiple RUN commands then the un-cleaned-up content would be committed as part of the image and takes up space even though it's removed in a later layer.

"More layers" isn't necessarily bad on its own, unless you have so many layers that you hit an internal limit. The only real downside here is creating a layer with content that you're planning to delete, in which case its space will still be in the final image.
As a more specific example of this, there's an occasional pattern where an image installs some development-only packages, runs an installation step, and uninstalls the packages. An Alpine-based example might look like
```
RUN apk add --virtual .build-deps \
      gcc make \
 && make \
 && make install \
 && apk del .build-deps
```
In this case you must run the "install" and "uninstall" in the same RUN command; otherwise Docker will create a layer that includes the build-only packages.

(A multi-stage build may be an easier way to accomplish the same goal of needing build-only tools, but not including them in the final image.)
The actual text of the RUN command is visible in docker history and similar inspection commands.

And...that's kind of it. If you think it's more maintainable to keep the installation steps in a separate script (maybe you have some way to use the same script in a non-Docker context) then go for it. I'd generally default to keeping the steps spelled out in RUN commands, and in general try to keep those setup steps as light-weight as possible.