dockercachingarmbuildx

In a multi-node Docker build, where is the layer cache stored? And when does cache garbage collection occur?


I am building a Docker image with Buildkit on a remote arm64 platform. To achieve this, I setup the buildx builder as follows:

$ docker buildx install
$ docker buildx create --name=multiarch --driver=docker-container
$ docker buildx create --name=multiarch --append --node=arm-docker --platform=linux/arm64 ssh://username@arm64.hostname.com
$ docker buildx use multiarch

This setup step is working fine. I can then build the linux/arm64 image as follows:

# build 1: first remote build triggered from the local host
$ docker buildx build --platform=linux/arm64 /path/to/mydockerfile/

This results in the following build logs:

WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load         
[+] Building 188.5s (22/22) FINISHED                                                                                                                                                                       
 => [internal] load build definition from Dockerfile                                                                                                                                                  0.1s 
 => => transferring dockerfile: 15.55kB                                                                                                                                                               0.1s
 => [internal] load .dockerignore                                                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/node:16.14-bullseye-slim                                                                                                                           0.4s
 => CACHED [base 1/4] FROM docker.io/library/node:16.14-bullseye-slim@sha256:d54981fe891c9e3442ea05cb668bc8a2a3ee38609ecce52c7b5a609fadc6f64b                                                         0.0s
 => => resolve docker.io/library/node:16.14-bullseye-slim@sha256:d54981fe891c9e3442ea05cb668bc8a2a3ee38609ecce52c7b5a609fadc6f64b                                                                     0.0s
 => [internal] load build context                                                                                                                                                                     0.1s
 => => transferring context: 64B                                                                                                                                                                      0.0s
 => [base 2/4] RUN apt update   && apt install -y git     gcc libgl1 libxi6 make     autoconf libtool pkg-config zlib1g-dev     python g++                                                           54.0s
...

My expectation is that subsequent builds will use the Docker layer cache. This is the case if I run the same command immediately: (notice the CACHED statements)

# build 2: second remote build triggered from the local host
$ docker buildx build --platform=linux/arm64 /path/to/mydockerfile/
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load         
[+] Building 0.7s (22/22) FINISHED                                                                                                                                                                         
 => [internal] load build definition from Dockerfile                                                                                                                                                  0.1s
 => => transferring dockerfile: 15.55kB                                                                                                                                                               0.1s 
 => [internal] load .dockerignore                                                                                                                                                                     0.1s 
 => => transferring context: 2B                                                                                                                                                                       0.1s 
 => [internal] load metadata for docker.io/library/node:16.14-bullseye-slim                                                                                                                           0.3s 
 => [base 1/4] FROM docker.io/library/node:16.14-bullseye-slim@sha256:d54981fe891c9e3442ea05cb668bc8a2a3ee38609ecce52c7b5a609fadc6f64b                                                                0.0s 
 => => resolve docker.io/library/node:16.14-bullseye-slim@sha256:d54981fe891c9e3442ea05cb668bc8a2a3ee38609ecce52c7b5a609fadc6f64b                                                                     0.0s
 => [internal] load build context                                                                                                                                                                     0.0s
 => => transferring context: 64B                                                                                                                                                                      0.0s
 => CACHED [base 2/4] RUN apt update   && apt install -y git     gcc libgl1 libxi6 make     autoconf libtool pkg-config zlib1g-dev     python g++                                                     0.0s
 => CACHED [base 3/4] RUN mkdir -p /openedx/app /openedx/env                                                                                                                                          0.0s
...

But then, if I wait a few minutes and run the same command again, the layers are no longer cached:

# build 3: third remote build triggered from the local host
$ docker buildx build --platform=linux/arm64 /path/to/mydockerfile/
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 84.0s (20/23)                        
 => [internal] load .dockerignore                                                                                                                                                                     0.1s
 => => transferring context: 2B                                                                                                                                                                       0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                  0.1s
 => => transferring dockerfile: 15.55kB                                                                                                                                                               0.1s
 => [internal] load metadata for docker.io/library/node:16.14-bullseye-slim                                                                                                                           0.7s
 => [auth] library/node:pull token for registry-1.docker.io                                                                                                                                           0.0s
 => [base 1/4] FROM docker.io/library/node:16.14-bullseye-slim@sha256:d54981fe891c9e3442ea05cb668bc8a2a3ee38609ecce52c7b5a609fadc6f64b                                                                2.8s
 => => resolve docker.io/library/node:16.14-bullseye-slim@sha256:d54981fe891c9e3442ea05cb668bc8a2a3ee38609ecce52c7b5a609fadc6f64b                                                                     0.0s
 => => sha256:f819730668ed6ce893055fe48519a7562f409787a8c6484124a4ac81dd3ee2f3 452B / 452B                                                                                                            0.1s
 => => sha256:b8fb756f2ecf8b649e48f14874a4fb7cb1f399655453fe64b7fda7aa3d1086b8 2.76MB / 2.76MB                                                                                                        0.2s
 => => sha256:73d6fb98900661e1457a72cec5591ccec70d16856c7d0fdfca36a8cdc14ac2fe 34.49MB / 34.49MB                                                                                                      0.6s
 => => sha256:5dcf03983304e3396f5948d3c624e30b772d1ff3509c706caed83ef28438f1da 4.04kB / 4.04kB                                                                                                        0.3s
 => => sha256:6d4a449ac69c579312443ded09f57c4894e7adb42f7406abd364f95982fafc59 30.07MB / 30.07MB                                                                                                      0.6s
 => => extracting sha256:6d4a449ac69c579312443ded09f57c4894e7adb42f7406abd364f95982fafc59                                                                                                             0.8s
 => => extracting sha256:5dcf03983304e3396f5948d3c624e30b772d1ff3509c706caed83ef28438f1da                                                                                                             0.0s
 => => extracting sha256:73d6fb98900661e1457a72cec5591ccec70d16856c7d0fdfca36a8cdc14ac2fe                                                                                                             1.0s
 => => extracting sha256:b8fb756f2ecf8b649e48f14874a4fb7cb1f399655453fe64b7fda7aa3d1086b8                                                                                                             0.1s
 => => extracting sha256:f819730668ed6ce893055fe48519a7562f409787a8c6484124a4ac81dd3ee2f3                                                                                                             0.0s
 => [internal] load build context                                                                                                                                                                     0.1s
 => => transferring context: 1.56kB                                                                                                                                                                   0.1s
 => [base 2/4] RUN apt update   && apt install -y git     gcc libgl1 libxi6 make     autoconf libtool pkg-config zlib1g-dev     python g++                                                           48.6s
...

I'm guessing this means that the layer cache garbage collection was somehow run between the second and the third runs.

But if I ssh to the remote arm node and build the image from there multiple times (using the default buildx builder, not the multiarch one), I can see that the layers are properly cached, and for a long time:

# build 4: after a few builds triggered directly on the arm64 host
$ docker buildx build --platform=linux/arm64 /path/to/mydockerfile/
[+] Building 0.5s (23/23) FINISHED                                                                                                                                                                         
 => [internal] load build definition from Dockerfile                                                                                                                                                  0.0s
 => => transferring dockerfile: 15.55kB                                                                                                                                                               0.0s
 => [internal] load .dockerignore                                                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/node:16.14-bullseye-slim                                                                                                                           0.4s
 => [base 1/4] FROM docker.io/library/node:16.14-bullseye-slim@sha256:d54981fe891c9e3442ea05cb668bc8a2a3ee38609ecce52c7b5a609fadc6f64b                                                                0.0s
 => [internal] load build context                                                                                                                                                                     0.0s
 => => transferring context: 64B                                                                                                                                                                      0.0s
 => CACHED [base 2/4] RUN apt update   && apt install -y git     gcc libgl1 libxi6 make     autoconf libtool pkg-config zlib1g-dev     python g++   
...

The difference between the two environment seems to imply that the layer cache is stored on the nodes where the buildx command is run, not on the arm64 remote host. This is confirmed by pruning the build cache with:

docker buildx prune

This frees up some space on the local instance, thus confirming that the cache would be stored on the local instance.

My hypothesis is that the layers are removed from cache by some garbage collector. Indeed, checking the Docker docs, it appears that there is a default layer cache garbage collection configuration file somewhere: https://docs.docker.com/build/cache/garbage-collection/

The /etc/buildkit/buildkitd.toml file does not exist on my system (Ubuntu 22.04). But I can create it and disable garbage collection from there:

[worker.oci]
  gc = false

The problem is that I cannot test this configuration, because after I ran the docker buildx prune command above, I am no longer facing the initial issue...

This was rather a lengthy brain dump, sorry about that :-/ I have the following questions:

  1. Am I right that the Buildkit layer cache is stored on the local host, and not on the remote?
  2. In which folder is the buildkit cache stored?
  3. Is there any way to view the activity of the cache garbage collector? for instance via some logs?
  4. How should I configure the buildkit cache garbage collector to extend cache duration?

Solution

  • I've had to search long and hard to answer these questions, so I'll post the answers just as much for myself as for the rest of the world.

    1. Am I right that the Buildkit layer cache is stored on the local host, and not on the remote?

    No, this is incorrect. The BuildKit cache is stored on the remote host where the BuildKit container is running.

    On the remote host, the container that runs BuildKit is called "buildx_buildkit_arm-docker" -- provided the builder was created using the command from the original question. The cache is stored is stored in a volume that is attached to this container.

    1. In which folder is the buildkit cache stored?

    The volume attached to the BuildKit container can be found by running docker inspect buildx_buildkit_arm-docker. In my case, on the host, this volume is stored in /var/lib/docker/volumes/buildx_buildkit_arm-docker_state. The amount of data that it uses can be monitored by running du -sm /var/lib/docker/volumes/buildx_buildkit_arm-docker_state/. Or: docker system df (see the "Volumes" lines).

    1. Is there any way to view the activity of the cache garbage collector? for instance via some logs?

    The builder needs to be configured with debug = true (see below on how to configure the builder). Then, garbage collection event are logged to stdout by the BuildKit container. Thus, on the remote host:

    $ docker logs buildx_buildkit_arm-docker 2>&1 | grep garbage
    ...
    time="2023-06-15T18:49:12Z" level=debug msg="content garbage collected" d=7.403837ms
    ...
    
    1. How should I configure the buildkit cache garbage collector to extend cache duration?

    The builder is configured whenever it is created for the first time, and only then. For instance, if you create a builder on two different hosts but configure the builder only the second time, then the configuration will be ignored, because the buildkit container is already running. The new configuration will only be taken into account once the container is stopped, removed, and restarted by a build (I think):

    docker stop buildx_buildkit_arm-docker
    docker container rm buildx_buildkit_arm-docker
    

    To configure the builder, use the docker buildx create --config=./path/to/buildkitd.toml option. For instance, create buildkitd.toml on the local host:

    # https://docs.docker.com/build/buildkit/toml-configuration/
    # Enable debug logs
    debug = true
    [worker.oci]
      # Enable garbage collection
      gc = true
      # Not sure what this does. Keep data under 80GB?
      gckeepstorage = 80000000000
      # When storage exceeds 40GB delete data older than 4 days
      [[worker.oci.gcpolicy]]
        keepBytes = 40000000000
        keepDuration = 345600
      # Keep data under 80GB
      [[worker.oci.gcpolicy]]
        all = true
        keepBytes = 80000000000
    
    

    EDIT 2024/11/18: the buildkit.toml spec is currently being updated, which means that the current docs is not accurate for most users just yet.

    Then, create the builder with:

    docker buildx create --name=multiarch --driver=docker-container
    docker buildx create --name=multiarch --config=./path/to/buildkitd.toml --append --node=arm-docker --platform=linux/arm64/v8 ssh://username@arm64.remotehost.com
    docker buildx use multiarch
    docker buildx install
    

    Once the BuildKit container has been created on the remote host (for instance after starting a build) you can verify that the configuration was properly loaded in the Buildkit container by running:

    docker exec buildx_buildkit_arm-docker cat /etc/buildkit/buildkitd.toml
    

    This should print a re-formatted version of your original configuration.