kubernetesjenkinsartifactory

Super Odd Behavior Helm/Jenkins/Artifactory YAML


Using YAML (with Helm), we created the following file to define agent containers. This file, as it stands below, works correctly as does another file with a different set of agent definitions.

apiVersion: v1
kind: Pod
metadata:
  name: pod-yaml
spec:
  containers:
  - name: maven
    image: maven:3.8.1-jdk-8
    command:
    - sleep
    args:
    - 99d
  - name: python
    image: python:latest
    command:
    - sleep
    args:
    - 99d
  - name: node10jdk8
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node10_jdk8:v4
    command:
    - sleep
    args:
    - 99d
  - name: node10jdk11
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node10_jdk11:v2
    command:
    - sleep
    args:
    - 99d
  - name: node12jdk8
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node12_jdk8:v2
    command:
    - sleep
    args:
    - 99d
  - name: node12jdk11
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node12_jdk11:v2
    command:
    - sleep
    args:
    - 99d
  - name: node14jdk8
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node14_jdk8:v2
    command:
    - sleep
    args:
    - 99d
  - name: node16jdk11
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node16_jdk11:v2
    command:
    - sleep
    args:
    - 99d
  - name: node18jdk11
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node18_jdk11:v2
    command:
    - sleep
    args:
    - 99d
  - name: node20jdk11
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node20_jdk11:v2
    command:
    - sleep
    args:
    - 99d
  - name: jra-base
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_base:v3
    command:
    - sleep
    args:
    - 99d

We had several more containers defined but we would get errors like this when running a Jenkins pipeline:

14:26:47  Created Pod: kubernetes jenkins-dev/agents-jenkins-yaml-agents-test-128-tlk8h-tnw11-gf62j
14:26:53  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_base:v3". Check if image tag name is spelled correctly.
14:26:53  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node12_jdk11:v2". Check if image tag name is spelled correctly.
14:26:53  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node12_jdk8:v2". Check if image tag name is spelled correctly.
14:26:53  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node14_jdk8:v2". Check if image tag name is spelled correctly.
14:26:53  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node16_jdk11:v2". Check if image tag name is spelled correctly.
14:26:53  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node18_jdk11:v2". Check if image tag name is spelled correctly.
14:26:53  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node20_jdk11:v2". Check if image tag name is spelled correctly.

I took all of the offending container images and placed them into a separate YAML file, ran the test and the test worked.

I decided then I would add one more image, test, lather rinse repeat. I add one image:

  - name: node16jdk17
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node16_jdk17:v2
    command:
    - sleep
    args:
    - 99d

And the error appeared again, but not with the new container definition that had been added. I removed that definition and it ran perfectly again. I decided to get another image to try and added this:

  - name: node14jdk11
    image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node14_jdk11:v2
    command:
    - sleep
    args:
    - 99d

And it failed, but this time only the new image showed as failing:

13:10:58  [Pipeline] node
13:11:08  Created Pod: kubernetes jenkins-dev/agents-jenkins-yaml-agents-test-125-93416-q6bzf-bqr63
13:11:12  ERROR: Unable to pull Docker image "ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node14_jdk11:v2". Check if image tag name is spelled correctly.
13:11:12  [Pipeline] // node

What am I missing here? The YAML files aren't approaching a length restriction, as far as I know. The image tag names must be "spelled correctly" as the artifacts are retrieved when none of these additions are made. I have checked and double-checked the syntax. There are no weird characters in the file.

Am I missing something really obvious?

UPDATE

Here is a bare-bones version of a pipeline using the pod/container definitions:

@Library('SCMLibraries@jenkins-pod-tests')_ // Load External Libraries
def podDefs = libraryResource('./pod.yaml') 
pipeline {
    agent any
    environment {
        PATH = '/var/jenkins/.nvm/versions/node/v10.24.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'
        NVM_DIR = '/var/jenkins/.nvm'
        NVM_INC = '/var/jenkins/.nvm/versions/node/v10.24.1/include/node'
        NVM_CD_FLAGS = ''
        NVM_BIN = '/var/jenkins/.nvm/versions/node/v10.24.1/bin'
    }
    stages {
        stage('Setup the Build') {
            agent {
                kubernetes {
                    defaultContainer 'jnlp'
                    yaml podDefs
                }
            }
            steps {
                container('node10jdk8') {
                    // code goes here
                }
            }
        }
    }
}

Not shown is the parallel processing pipeline in which multiple containers would be used.


Solution

  • Because pod.yaml is called as a resource it does not inherit the service account used for the Jenkins instance on Kubernetes. We created a separate service account for the pods and then included it in the YAML file:

    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-yaml
    spec:
      serviceAccountName: agent-pods
      containers:
      - name: maven
        image: maven:3.8.1-jdk-8
        command:
        - sleep
        args:
        - 99d
      - name: python
        image: python:latest
        command:
        - sleep
        args:
        - 99d
      - name: node10jdk8
        image: ourartifacts.jfrog.io/docker-local/jenkins_remote_agent_node10_jdk8:v4
        command:
        - sleep
        args:
        - 99d
    

    pod.yaml was using the default service account which did not have access to some of the resources. (Which also highlighted that we need to determine why it did allow some, but that's another story.) Adding a service account for these calls solves the issue.