How can I pass the gitlab job token into a docker build without causing a cache miss?

We are using the PyPI repos built into our gitlab deployment to share our internal packages with multiple internal projects. When we build our docker images we need to install those packages as part of image creation. However the gitlab CI token that we use to get access to the gitlab PyPI repository is a one-off token, and so is different every time we run the build.

Our Dockerfile starts something like this:

FROM python:3.9

WORKDIR /project

COPY poetry.lock pyproject.toml
RUN pip install poetry

ARG CI_JOB_TOKEN
RUN poetry config http-basic.gitlab-pypi-repo gitlab-ci-token ${CI_JOB_TOKEN}
RUN poetry install --no-interaction

Now because we're using poetry and the versions are locked in poetry.lock, when we get to the poetry steps we shouldn't need to reinstall poetry unless the poetry.lock file has changed, but because the CI_JOB_TOKEN is always different we always miss the cache and have to rebuild poetry and everything downstream (which is actually where most of the work is) as well.

So is there a way that we can pass CI_JOB_TOKEN into the docker build but in a way that is ignored for the purposes of the cache? Or maybe there's another way to achieve this?

Solution

Use build secrets instead (requires build kit)

You can mount the secret at build time using the --mount argument to the RUN instruction. Suppose you have the following in a dockerfile:

# ...
RUN --mount=type=secret,id=mysecret echo "$(cat /run/secrets/mysecret)" > .foo
RUN echo "another layer" > .bar

Then you can pass the secret into the build using the --secret flag.

On the first run, you'll see the RUN instruction executed and if you were to inspect the .foo file, it would contain the secret (because we echoed it to the file in the RUN command -- in practice, this might be your poetry configuration, for example).

$ echo -n supersecret > ../secret.txt
$ docker build --secret id=mysecret,src=../secret.txt -t test .
# ...
 => [3/4] RUN --mount=type=secret,id=mysecret echo "$(cat /run/secrets/mysecret)" > .foo                                                0.2s
 => [4/4] RUN echo "another layer" > .bar                                                                                                         0.4s
# ...

Even if your secret changes, on subsequent runs, you'll see the relevant layers still remain cached:

$ echo -n newvalue > ../secret.txt
$ docker build --secret id=mysecret,src=../secret.txt -t test .
# ...
 => CACHED [3/4] RUN --mount=type=secret,id=mysecret echo "$(cat /run/secrets/mysecret)" > .foo                                         0.0s
 => CACHED [4/4] RUN echo "another layer" > .bar                                                                                                  0.0s
# ...

Of course, because the RUN instruction was cached, you would see the old secret value in .foo in the resulting build.

As a separate note, you should be aware that your poetry config command is writing to disk. This means that your secret will be contained in the resulting image layers, which may not be ideal from a security standpoint.