I am running Jenkins master and agents on K8S. Some of CI workloads require a docker image to be build and pushed to ECR. I installed Jenkins using its Helm chart and running the DIND agent as a separate agent. Here are the values I am passing to the Jenkins chart
controller:
numExecutors: 0
image:
registry: "1234567890.dkr.ecr.us-east-1.amazonaws.com"
repository: "jenkins"
tag: latest
installPlugins:
- configuration-as-code:latest
- kubernetes:latest
- workflow-aggregator:latest
- git:latest
resources:
requests:
cpu: "50m"
memory: "256Mi"
limits:
cpu: "2000m"
memory: "4096Mi"
nodeSelector:
usage: jenkins
ingress:
enabled: true
apiVersion: "networking.k8s.io/v1"
hostName: "jenkins.domain.com"
ingressClassName: nginx-internal
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
type: internal
cert-manager.io/cluster-issuer: letsencrypt
tls:
- secretName: jenkins.domain.com
hosts:
- jenkins.domain.com
containerSecurityContext:
runAsUser: 1000
runAsGroup: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
existingSecret: "jenkins-credentials"
agent:
enabled: true
image:
repository: 1234567890.dkr.ecr.us-east-1.amazonaws.com/jenkins-worker
tag: latest
podRetention: "Onfailure"
namespace: jenkins
resources:
requests:
cpu: "512m"
memory: "512Mi"
nodeSelector:
usage: jenkins
runAsUser: 1000
runAsGroup: 1000
persistence:
enabled: true
storageClassName: "gp2"
size: "100Gi"
serviceAccountAgent:
create: true
additionalAgents:
dind:
podName: dind-agent
customJenkinsLabels: dind-agent
image:
repository: 1234567890.dkr.ecr.us-east-1.amazonaws.com/jenkins-worker-dind
tag: v2
envVars:
- name: DOCKER_HOST
value: "tcp://localhost:2375"
alwaysPullImage: true
volumes:
- type: PVC
claimName: jenkins-worker-pvc
mountPath: "/var/lib/jenkins/shared-disk"
yamlTemplate: |-
spec:
securityContext:
privileged: true
containers:
- name: dind-daemon
image: 1234567890.dkr.ecr.us-east-1.amazonaws.com/docker-dind:v2
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: DOCKER_TLS_VERIFY
value: ""
- name: DOCKER_TLS_CERTDIR
value: ""
I created two docker images for the DIND agent. The first one is the jenkins jnlp agent that I installed python, boto3, and botocore on top and the second one is the docker-dind which I installed python3, boto3, and botocore on top.
Jenkins-jnlp dockerfile:
FROM jenkins/jnlp-agent-docker
USER root
RUN apk update \
&& apk add --no-cache \
ca-certificates \
curl \
gnupg \
wget \
tar \
git \
htop \
iftop \
jq \
unzip \
python3 \
py3-pip \
aws-cli \
tmux \
msmtp \
build-base \
nodejs \
# Install boto3 and botocore system-wide
RUN pip3 install --no-cache --break-system-packages boto3 botocore
COPY entrypoint.sh /entrypoint.sh
RUN chown jenkins:jenkins /entrypoint.sh
RUN chmod +x /entrypoint.sh
# Return to the Jenkins user
USER jenkins
ENTRYPOINT ["/entrypoint.sh"]
jenkins-jnlp entrypooint.sh
#!/usr/bin/env bash
RETRIES=6
sleep_exp_backoff=1
for((i=0;i<RETRIES;i++)); do
docker version
dockerd_available=$?
if [ $dockerd_available == 0 ]; then
break
fi
sleep ${sleep_exp_backoff}
sleep_exp_backoff="$((sleep_exp_backoff * 2))"
done
exec /usr/local/bin/jenkins-agent "$@"
Docker-dind dockerfile:
FROM docker:27.0.3-dind
# Install necessary dependencies and the latest version of awscli
RUN apk --update-cache add \
bash \
gcc \
musl-dev \
libffi-dev \
openssl-dev \
make \
zlib-dev \
python3 \
py3-pip \
aws-cli \
&& sed -i 's/ash/bash/g' /etc/passwd \
&& apk --no-cache del \
gcc \
musl-dev \
libffi-dev \
openssl-dev \
make \
zlib-dev \
&& rm -rf /var/cache/apk/* \
&& docker --version \
&& aws --version
# Install boto3 and botocore system-wide
RUN pip3 install --no-cache --break-system-packages boto3 botocore
CMD /bin/bash
I created the following test job to use the DIND agent. The agend container gets stuck on ContainerCreating stage with the following error
Warning FailedMount 6s (x7 over 39s) kubelet MountVolume.SetUp failed for volume "pvc-1744e00e-8e8a-44fd-b57f-8c3c4afd3ca9" : rpc error: code = Internal desc = Could not mount "fs-01a1ca999fc999 │
│ 985:/" at "/var/lib/kubelet/pods/073669a3-1186-4e58-8fb9-21e8cb710891/volumes/kubernetes.io~csi/pvc-1744e00e-8e8a-44fd-b57f-8c3c4afd3ca9/mount": mount failed: exit status 1 │
│ Mounting command: mount │
│ Mounting arguments: -t efs -o accesspoint=fsap-0745858584ced86df,tls fs-01a5u5u5u1985:/ /var/lib/kubelet/pods/073669a3-1186-4e58-8fb9-21e8cb710891/volumes/kubernetes.io~csi/pvc-1744e00e-8e8a-44fd-b57f-8c3c4 │
│ afd3ca9/mount │
│ Output: Failed to resolve "fs-01a1ca999fc999.efs.us-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. │
│ See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. │
│ Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first
I looked all over Google, read several stack overflow, medium and other articles. I have basically hit a dead end. Hoping someone here could provide some help
I fixed this by updated the version of AWS EFS CSI driver to v2.0.7 and updating the pod template for Jenkins