dockerkubernetesjenkinskubernetes-helmamazon-efs

Jenkins on EKS, mount EFS in DIND


I am running Jenkins master and agents on K8S. Some of CI workloads require a docker image to be build and pushed to ECR. I installed Jenkins using its Helm chart and running the DIND agent as a separate agent. Here are the values I am passing to the Jenkins chart

controller:
      numExecutors: 0
      image:
        registry: "1234567890.dkr.ecr.us-east-1.amazonaws.com"
        repository: "jenkins"
        tag: latest
      installPlugins:
        - configuration-as-code:latest
        - kubernetes:latest
        - workflow-aggregator:latest
        - git:latest  
      resources:
        requests:
          cpu: "50m"
          memory: "256Mi"
        limits:
          cpu: "2000m"
          memory: "4096Mi"
      nodeSelector:
        usage: jenkins
      ingress:
        enabled: true
        apiVersion: "networking.k8s.io/v1"
        hostName: "jenkins.domain.com"
        ingressClassName: nginx-internal
        annotations:
          nginx.ingress.kubernetes.io/proxy-body-size: "100m"
          nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
          nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
          type: internal
          cert-manager.io/cluster-issuer: letsencrypt
        tls: 
          - secretName: jenkins.domain.com
            hosts:
              - jenkins.domain.com
      containerSecurityContext:
        runAsUser: 1000
        runAsGroup: 1000
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
      existingSecret: "jenkins-credentials"                                               
    agent:
      enabled: true
      image:
        repository: 1234567890.dkr.ecr.us-east-1.amazonaws.com/jenkins-worker
        tag: latest  
      podRetention: "Onfailure"
      namespace: jenkins
      resources:
        requests:
          cpu: "512m"
          memory: "512Mi"
      nodeSelector:
        usage: jenkins
      runAsUser: 1000
      runAsGroup: 1000
    persistence:
      enabled: true
      storageClassName: "gp2"
      size: "100Gi"
    serviceAccountAgent:
      create: true
    additionalAgents: 
      dind:
        podName: dind-agent
        customJenkinsLabels: dind-agent
        image: 
          repository: 1234567890.dkr.ecr.us-east-1.amazonaws.com/jenkins-worker-dind
          tag: v2
        envVars:
         - name: DOCKER_HOST
           value: "tcp://localhost:2375"
        alwaysPullImage: true
        volumes:
        - type: PVC
          claimName: jenkins-worker-pvc
          mountPath: "/var/lib/jenkins/shared-disk"
        yamlTemplate:  |-  
          spec:
            securityContext:
                privileged: true 
            containers:
              - name: dind-daemon 
                image: 1234567890.dkr.ecr.us-east-1.amazonaws.com/docker-dind:v2
                imagePullPolicy: Always
                securityContext: 
                  privileged: true
                env: 
                  - name: DOCKER_TLS_VERIFY
                    value: ""
                  - name: DOCKER_TLS_CERTDIR
                    value: ""

I created two docker images for the DIND agent. The first one is the jenkins jnlp agent that I installed python, boto3, and botocore on top and the second one is the docker-dind which I installed python3, boto3, and botocore on top.

Jenkins-jnlp dockerfile:

FROM jenkins/jnlp-agent-docker

USER root

RUN apk update \
   && apk add --no-cache \
      ca-certificates \
      curl \
      gnupg \
      wget \
      tar \
      git \
      htop \
      iftop \
      jq \
      unzip \
      python3 \
      py3-pip \
      aws-cli \
      tmux \
      msmtp \
      build-base \
      nodejs \
# Install boto3 and botocore system-wide
RUN pip3 install --no-cache --break-system-packages boto3 botocore

COPY entrypoint.sh /entrypoint.sh
RUN chown jenkins:jenkins /entrypoint.sh
RUN chmod +x /entrypoint.sh    

# Return to the Jenkins user
USER jenkins

ENTRYPOINT ["/entrypoint.sh"]

jenkins-jnlp entrypooint.sh

#!/usr/bin/env bash

RETRIES=6

sleep_exp_backoff=1

for((i=0;i<RETRIES;i++)); do
    docker version
    dockerd_available=$?
    if [ $dockerd_available == 0 ]; then
        break 
    fi
    sleep ${sleep_exp_backoff}
    sleep_exp_backoff="$((sleep_exp_backoff * 2))"
done

exec /usr/local/bin/jenkins-agent "$@"

Docker-dind dockerfile:

FROM docker:27.0.3-dind

# Install necessary dependencies and the latest version of awscli
RUN apk --update-cache add \
        bash \
        gcc \
        musl-dev \
        libffi-dev \
        openssl-dev \
        make \
        zlib-dev \
        python3 \
        py3-pip \
        aws-cli \
    && sed -i 's/ash/bash/g' /etc/passwd \
    && apk --no-cache del \
        gcc \
        musl-dev \
        libffi-dev \
        openssl-dev \
        make \
        zlib-dev \
    && rm -rf /var/cache/apk/* \
    && docker --version \
    && aws --version

# Install boto3 and botocore system-wide
RUN pip3 install --no-cache --break-system-packages boto3 botocore


CMD /bin/bash

I created the following test job to use the DIND agent. The agend container gets stuck on ContainerCreating stage with the following error

Warning  FailedMount  6s (x7 over 39s)  kubelet            MountVolume.SetUp failed for volume "pvc-1744e00e-8e8a-44fd-b57f-8c3c4afd3ca9" : rpc error: code = Internal desc = Could not mount "fs-01a1ca999fc999 │
│ 985:/" at "/var/lib/kubelet/pods/073669a3-1186-4e58-8fb9-21e8cb710891/volumes/kubernetes.io~csi/pvc-1744e00e-8e8a-44fd-b57f-8c3c4afd3ca9/mount": mount failed: exit status 1                                       │
│ Mounting command: mount                                                                                                                                                                                            │
│ Mounting arguments: -t efs -o accesspoint=fsap-0745858584ced86df,tls fs-01a5u5u5u1985:/ /var/lib/kubelet/pods/073669a3-1186-4e58-8fb9-21e8cb710891/volumes/kubernetes.io~csi/pvc-1744e00e-8e8a-44fd-b57f-8c3c4 │
│ afd3ca9/mount                                                                                                                                                                                                      │
│ Output: Failed to resolve "fs-01a1ca999fc999.efs.us-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID.                 │
│ See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail.                                                                                                                                        │
│ Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first

I looked all over Google, read several stack overflow, medium and other articles. I have basically hit a dead end. Hoping someone here could provide some help


Solution

  • I fixed this by updated the version of AWS EFS CSI driver to v2.0.7 and updating the pod template for Jenkins