We recently converted all our Jenkins docker builds to multi-arch, and it's generally working pretty well. They're building on ARM-based instances, so the x86 builds are going through qemu, which takes a long time, and also has some glitches (for a couple Go projects, in particular).
I'm happy to spin up some dedicated x86 build resources. The only tricky bit is that our build agents are autoscaled, so some of the recipes I've seen for configuring a "remote" builder that knows a particular architecture would require more code to find an active agent.
Our docker orchestration framework lets us expand any running service ip/port requirement into the environment of another service as a dependency, so I could probably create a dummy image that just wraps the docker buildx create
step. And it would need to hang out to determine if the remote builder has disappeared and been replaced. But it occurred to me that it would be cleaner to just add buildkit to my cluster manifest directly and pass the remote builder in that way.
Is there a way to directly docker run
a buildkit in the same way that docker buildx create
would?
(Side note: if I can talk to buildkit through haproxy and it isn't going to assume that it gets the same backend every time it connects, that would greatly simplify things, as I could just point the remote at the reverse proxy and let the service routing rules track the dynamic backends.)
Turns out that this is actually pretty straightforward.
On my autoscaled intel build machines, I'm doing the equivalent of
docker run moby/buildkit:latest --privileged --addr tcp://0.0.0.0:8086
I do the same on the arm build machines, except on a different port:
docker run moby/buildkit:latest --privileged --addr tcp://0.0.0.0:8085
on my Jenkins JNLP agents, I pass in (arbitrarily named) env vars to define the local and remote platforms as either linux/amd64
or linux/arm64
, and an env var to define the local and remote servers created above, as tcp://<addr>:<port>
.
local_name="$(echo $BUILDKIT_LOCAL_PLATFORM|sed -E 's/(.+\/)?(.*)/\2/')-builder"
docker buildx create --use --name multi-builder --node "${local_name}"\
--platform="${BUILDKIT_LOCAL_PLATFORM}"\
--driver remote "${BUILDKIT_LOCAL_SERVER}"
remote_name="$(echo $BUILDKIT_REMOTE_PLATFORM|sed -E 's/(.+\/)?(.*)/\2/')-builder"
docker buildx create --append --name multi-builder --node "${remote_name}"\
--platform="${BUILDKIT_REMOTE_PLATFORM}"\
--driver remote "${BUILDKIT_REMOTE_SERVER}"
The local server is tcp://<local-ip-address>:8085
or tcp://<ip-address>:8086
depending on the architecture, and the remote server... well that depends. If you have a single build machine for each architecture, its easy, just hardcode the name or ip in the same syntax. I went for bonus points by putting them behind haproxy
in an autoscaling group. So my remote server address is the DNS name of my proxy. In case anyone finds this and is looking to do something similar, it actually works surprisingly well to have the build servers behind a proxy! My setup looks like this:
frontend buildkit-amd
bind *:8086
mode tcp
default_backend buildkit-amd
acl corp-ip src -f /bastion/haproxy/allow/corp.allow
tcp-request connection reject if !corp-ip
timeout client 60m
timeout connect 60s
frontend buildkit-arm
bind *:8085
mode tcp
default_backend buildkit-arm
acl corp-ip src -f /bastion/haproxy/allow/corp.allow
tcp-request connection reject if !corp-ip
timeout client 60m
timeout connect 60s
with the backend server lists generated dynamically (I use etcd/confd):
backend buildkit-arm
mode tcp
balance leastconn
timeout server 60m
server buildkit-arm-2ef8437b8c226e0cee4b13d1e216906fe4eda78c7147c33b79d131dbaa0f2f64 10.2.1.60:8085
server buildkit-arm-868212ff42b1dc35be162182c219f15a9df161424b5517964402750ff730407e 10.2.3.80:8085
backend buildkit-amd
mode tcp
balance leastconn
timeout server 60m
server buildkit-amd-21de68bf69b323ee5a7c4c388d878d32249f89af2abca16053b0331dc11296ce 10.2.5.13:8086
server buildkit-amd-25b06ab35f2f34b333d8f8214c11d6aecb84c3fcfca19d63c0746cafa60f5416 10.2.1.239:8086
Note that I failed to get this working with haproxy's h2
mode, but tcp mode seems to work fine. Note the increased timeouts necessary for the long-lived connection. (Our setup is protected by an IP whitelist for our dev VPN; if you're more security minded, you'll want certs and whatnot.)
When I log into a build worker and docker exec a shell inside my JNLP agent, I can check my setup like this:
# docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS
multi-builder* remote
\_ arm64-builder \_ tcp://10.2.3.80:8085 running v0.16.0 linux/arm64*, linux/arm/v7, linux/arm/v6
\_ amd64-builder \_ tcp://devproxy.mycorp.com:8086 running v0.16.0 linux/amd64*, linux/amd64/v2, linux/amd64/v3, linux/386
default docker
\_ default \_ default running v0.16.0 linux/arm64
If everything is working, you'll see the "running" status on all your build workers. (One completely surprising thing for me is that docker buildx ls
shows different results if I run it at the host level vs. inside a container that mounts /var/run/docker.sock
! Most docker stuff is the same, but buildkit pays attention to docker contexts, which is not an area I've explored much.)
This was a pretty niche Q (&A) but I hope it helps someone out there!