dockerkubernetesgoogle-kubernetes-enginedocker-buildkit

Rootless buildkitd throws permission error inside container


I decided to use the rootless version of Buildkit to build and push Docker images to a GCR (Google Container Registry) from within a container in Kubernetes.

I stumbled upon this error:

/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount859701112: [{Type:bind Source:/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/2 Options:[rbind ro]}]: operation not permitted

I am running buildkitd as a deployment linked to a service as specified by the buildkit documentation Those resources are ran inside a Kubernetes Cluster hosted on the Google Kubernetes Engine.

I am using the following YAML for the Deployment and Service

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: buildkitd
  template:
    metadata:
      labels:
        app: buildkitd
      annotations:
        container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
        container.seccomp.security.alpha.kubernetes.io/buildkitd: unconfined
    spec:
      containers:
      - name: buildkitd
        image: moby/buildkit:master-rootless
        args:
        - --addr
        - unix:///run/user/1000/buildkit/buildkitd.sock
        - --addr
        - tcp://0.0.0.0:1234
        - --oci-worker-no-process-sandbox
        readinessProbe:
          exec:
            command:
            - buildctl
            - debug
            - workers
          initialDelaySeconds: 5
          periodSeconds: 30
        livenessProbe:
          exec:
            command:
            - buildctl
            - debug
            - workers
          initialDelaySeconds: 5
          periodSeconds: 30
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
        ports:
        - containerPort: 1234
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  ports:
  - port: 1234
    protocol: TCP
  selector:
    app: buildkitd

It is the same as buildkit documentation's without the TLS certificates setup.

From another Pod, I then contact the Buildkit Daemon using the following command:

./bin/buildctl \
    --addr tcp://buildkitd:1234 \
    build \
    --frontend=dockerfile.v0 \
    --local context=. \
    --local dockerfile=. \
    --output type=image,name=eu.gcr.io/$PROJECT_ID/test-image,push=true

The buildkitd container successfuly receives the request but throws the error above.

The output of the buildctl command is the following:

#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.1s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 120B done
#2 DONE 0.1s
error: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount859701112: [{Type:bind Source:/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/2 Options:[rbind ro]}]: operation not permitted

Which is the error from the daemon.

What strikes me is that I am able to containerise buildkitd inside a minikube cluster using the exact same YAML file as such:

NAME                             READY   STATUS    RESTARTS   AGE
pod/buildkitd-5b46d94f5d-xvnbv   1/1     Running   0          36m

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/buildkitd    ClusterIP   10.100.72.194   <none>        1234/TCP   36m
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    36m

NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/buildkitd   1/1     1            1           36m

NAME                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/buildkitd-5b46d94f5d   1         1         1       36m

I deploy the service and deployment inside minikube and forward the service port using the following command to be able to access the deployment outside minikube.

kubectl port-forward service/buildkitd 2000:1234

And with that setup I am able to execute my buildctl command without any issue (Image building and push to GCR).

I wish to understand why it works on minikube and not on the Google Kubernetes Engine.

Here is the container startup log if that is of any help

auto snapshotter: using native
NoProcessSandbox is enabled. Note that NoProcessSandbox allows build containers to kill (and potentially ptrace) an arbitrary process in the BuildKit host namespace. NoProcessSandbox should be enabled only when the BuildKit is running in a container as an unprivileged user.
found worker \"wdukby0uwmjyvf2ngj4e71s4m\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:buildkitd-5b46d94f5d-xvnbv org.mobyproject.buildkit.worker.snapshotter:native], platforms=[linux/amd64 linux/386]"
rootless mode is not supported for containerd workers. disabling containerd worker.
found 1 workers, default=\"wdukby0uwmjyvf2ngj4e71s4m\"
currently, only the default worker can be used.
TLS is not enabled for tcp://0.0.0.0:1234. enabling mutual TLS authentication is highly recommended
running server on /run/user/1000/buildkit/buildkitd.sock
running server on [::]:1234

Solution

  • Rootless requires various preparation steps to be performed on the host (this would need to be done outside of Kubernetes on the VM host running the kubernetes node). See the rootless documentation for a full list of steps. Note that these steps vary by Linux distribution because different distributions have already performed some or all of these prerequisite steps.

    Ubuntu

    • No preparation is needed.

    • overlay2 storage driver is enabled by default (Ubuntu-specific kernel patch).

    • Known to work on Ubuntu 16.04, 18.04, and 20.04.

    Debian GNU/Linux

    • Add kernel.unprivileged_userns_clone=1 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl --system.

    • To use the overlay2 storage driver (recommended), run sudo modprobe overlay permit_mounts_in_userns=1 (Debian-specific kernel patch, introduced in Debian 10). Add the configuration to /etc/modprobe.d for persistence.

    • Known to work on Debian 9 and 10. overlay2 is only supported since Debian 10 and needs modprobe configuration described above.

    Arch Linux

    • Add kernel.unprivileged_userns_clone=1 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl --system

    openSUSE

    • sudo modprobe ip_tables iptable_mangle iptable_nat iptable_filter is required. This might be required on other distros as well depending on the configuration.

    • Known to work on openSUSE 15.

    Fedora 31 and later

    • Fedora 31 uses cgroup v2 by default, which is not yet supported by the containerd runtime. Run sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0" to use cgroup v1.

    • You might need sudo dnf install -y iptables.

    CentOS 8

    • You might need sudo dnf install -y iptables.

    CentOS 7

    • Add user.max_user_namespaces=28633 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl --system.

    • systemctl --user does not work by default. Run the daemon directly without systemd: dockerd-rootless.sh --experimental --storage-driver vfs

    • Known to work on CentOS 7.7. Older releases require additional configuration steps.

    • CentOS 7.6 and older releases require COPR package vbatts/shadow-utils-newxidmap to be installed.

    • CentOS 7.5 and older releases require running sudo grubby --update-kernel=ALL --args="user_namespace.enable=1" and a reboot following this.