Is there any advantage to layer cache invalidation by using ADD
instead of RUN
?
Background
I frequently see Dockerfiles that install wget or curl just to RUN wget …
or RUN curl …
to install some dependency that cannot be found in package management.
I suspect these could be converted to simple ADD <url> <dest>
lines, and that would at least obviate the need for adding curl or wget to the image.
Further, it seems like the docker daemon could rely on HTTP cache invalidation to inform its own layer cache invalidation. At a minimum (e.g. in the absence of HTTP cache headers), it could GET
the resource, hash it, and calculate invalidation the same way it does for local files.
NOTE: I am familiar with the usage of Add
vs RUN …
, but I am looking for a strong reason to choose one over the other. In particular, I want to know if ADD <url>
can behave any more intelligently with regard to layer cache invalidation.
Update: The Docker documentation was changed to advise using ADD since it works better with Docker build caching. However, if you do not need to preserve the downloaded file as part of the Docker image you should use a multi-stage build and mount the file from the new stage instead of copying. See https://docs.docker.com/build/building/best-practices/#add-or-copy.
This requires a minimum Dockerfile version 1.2
The original answer:
It is better to use RUN wget …
or RUN curl …
to download an archive instead of ADD
. This allows you to extract the archive files and delete the downloaded file in the same RUN
command. This prevents the downloaded file from being stored in the image.
As the Docker documentation says: "using ADD to fetch packages from remote URLs is strongly discouraged"
Avoid using ADD
to download an archive and then extracting it in separate RUN
commands, as this will create an intermediate file that will be stored in the image. Due to how Docker works, subsequent
RUN
command can only mark the file as deleted, but not actually remove it. For example, the following Dockerfile will create an intermediate file called big.tar.xz
in the image:
ADD https://example.com/big.tar.xz /usr/src/things/
RUN tar -xJf /usr/src/things/big.tar.xz -C /usr/src/things
RUN make -C /usr/src/things all
Instead, you can use a single RUN
command to download the archive, extract it, and run the make command, as shown in the following Dockerfile:
RUN mkdir -p /usr/src/things \
&& curl -SL https://example.com/big.tar.xz \
| tar -xJC /usr/src/things \
&& make -C /usr/src/things all
This Dockerfile will not create any intermediate files, and the downloaded file will not be stored in the image.