We have been in the process of setting up a new Jenkins instance and as we have onboarded various projects we have created Docker images to be used in K8s pods to execute various pipelines. Suddenly, we have run into a problem where new agents throw the following errors:
cp: cannot create regular file '/home/jenkins/agent/workspace/Agents/Corretto-Maven@tmp/durable-5eb2af2f/script.sh.copy': Permission denied
11:23:55 sh: 1: cannot create /home/jenkins/agent/workspace/Agents/Corretto-Maven@tmp/durable-5eb2af2f/jenkins-log.txt: Permission denied
11:23:55 sh: 1: cannot create /home/jenkins/agent/workspace/Agents/Corretto-Maven@tmp/durable-5eb2af2f/jenkins-result.txt.tmp: Permission denied
11:23:55 touch: cannot touch '/home/jenkins/agent/workspace/Agents/Corretto-Maven@tmp/durable-5eb2af2f/jenkins-log.txt': Permission denied
11:23:55 mv: cannot stat '/home/jenkins/agent/workspace/Agents/Corretto-Maven@tmp/durable-5eb2af2f/jenkins-result.txt.tmp': No such file or directory
...
11:29:02 process apparently never started in /home/jenkins/agent/workspace/Agents/Corretto-Maven@tmp/durable-5eb2af2f
In the K8s dashboard, we can see the container has started, but nothing else until it fails. When we searched the error, the most relevant answer came up and we checked the Durable Task Plugin finding we have the latest version, 577.v2a_8a_4b_7c0247. Subsequent searches yielded no results of value for this error.
It was even more puzzling because every Docker image agent created prior to August still runs perfectly and shows no sign of the errors.
More puzzling? We created a new Docker image using a basic Dockerfile we had used prior to August and while the old image runs perfectly, the new one exhibits the same error conditions as any of the new images.
# Base Image to customize a Jenkins Remote Agent.
FROM ubuntu:20.04
# variables
ENV USERNAME jenkins
ENV USERDIR /var/$USERNAME
# add a user and group
RUN useradd -u 1001 -U -c $USERNAME -d /var/jenkins -m -s /bin/bash $USERNAME
RUN mkdir /home/$USERNAME
RUN chown $USERNAME:$USERNAME /home/$USERNAME
WORKDIR /home/$USERNAME
# connection files required (connects to various services, such as Git, Artifactory, etc.)
COPY jen_files.tar /var/jenkins/
RUN tar xvf /var/jenkins/jen_files.tar --directory /var/jenkins/
RUN rm -f /var/jenkins/jen_files.tar
# install tools
RUN apt-get update && apt-get install -y \
jq \
git \
tar \
zip \
curl \
wget \
sudo
USER $USERNAME
CMD ["/bin/bash", "-c", "bash"]
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-pod-yaml
spec:
serviceAccountName: jenkins
imagePullSecrets:
- name: regcred
containers:
- name: ubuntu2004
image: 'ourartifacts.jfrog.io/docker-local/jenkins-remote-agents:ubuntu2004'
imagePullPolicy: Always
command:
- sleep
args:
- 99d
tty: true
// load shared library via @Library or other methods
@Library('SCMLibraries@ubuntu-pod-tests')_ // Load External Libraries
def podDefs = libraryResource('./ubuntu-pod.yaml')
pipeline {
agent any
stages {
stage('Pipeline Start') {
stages {
stage('SCM Library') {
agent {
kubernetes {
defaultContainer 'jnlp'
yaml podDefs
}
}
steps {
sh 'ls -la'
sh 'whoami'
sh 'echo $UID'
container('ubuntu2004') { // this is when we start to see the errors shown
sh 'whoami'
sh 'cat /etc/os*'
}
}
}
}
}
}
}
This method has worked for every agent image we have defined up until August. The problem did not appear until we started the process again a couple of weeks ago, to build new images for projects we are wanting to host on the new Jenkins instance. The Dockerfile shown here is the same Dockerfile we used to define an image that works prior to August.
org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true
so we can see additional output in the console log. This is what allows us to see the various permissions issues.What is causing these errors? What are we missing? Are there any other details which would make it clearer?
ARGH! (In a good way, maybe.)
I found the answer/workaround buried in this question. I added the runAsUser
to the pod definition:
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-pod-yaml
spec:
serviceAccountName: jenkins
imagePullSecrets:
- name: regcred
containers:
- name: ubuntu2004
image: 'ourartifacts.jfrog.io/docker-local/jenkins-remote-agents:ubuntu2004'
imagePullPolicy: Always
command:
- sleep
args:
- 99d
tty: true
securityContext:
runAsUser: 0
The errors went away and the pipeline is now working.
HOWEVER, while this is a good workaround, it does not totally address the original problem. All of the containers that normally work run as user 0 when the container starts without the intervention in the pod container definition.
UPDATE, after much head-scratching we were able to zero in on the original problem.
During the builds of various Docker images for Jenkins agents, one of the base images used had its user set to jenkins
. This setting was cached and re-used over and over again. Once we added --no-cache
to the Docker build parameters, the issue went away. This means that we no longer have to add the following lines to the YAML file:
securityContext:
runAsUser: 0
Because of the large number of users, we have decided to leave those lines in for clarity and have added information to our internal docs to highlight what is being done.