kuberneteskube-scheduler

CrashLoopBackOff for kube-scheduler Due to Missing Service Token


I have a problem with my Kubernetes cluster where my kube-scheduler pod is stuck in the 'CrashLoopBackOff' state and I am unable to rectify it. the logs are complaining of a missing service token:

kubectl logs kube-scheduler-master -n kube-system
I1011 09:01:04.309289       1 serving.go:319] Generated self-signed cert in-memory
W1011 09:01:20.579733       1 authentication.go:387] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W1011 09:01:20.579889       1 authentication.go:249] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W1011 09:01:20.579917       1 authentication.go:252] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W1011 09:01:20.579990       1 authorization.go:177] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W1011 09:01:20.580040       1 authorization.go:146] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
invalid configuration: no configuration has been provided

Can anyone please explain what /var/run/secrets/kubernetes.io/serviceaccount/token is, where is it supposed to be stored (is the path on the host or within the container) and how do I go about regenerating it?

I'm running version 1.15.4 across all of my nodes which were set up using kubeadm. I have stupidly upgrade the cluster since this error first started (I read that it could possibly be a bug in the version I was using). I was previously using version 1.14.*.

Any help would be greatly appreciated; everything runs on this cluster and I feel like my arms have been cut off with out it.

Thanks in advance,

Harry


Solution

  • It turns out that, as the pod is kube-scheduler, the /var/run/secrets/kubernetes.io/serviceaccount/token the logs are referring to are is mounted from /etc/kubernetes/scheduler.conf on the master node.

    For whatever reason, this was a completely empty file in my cluster. I regenerated it by following the instructions for kube-scheduler on Kubernetes the hard way:

    I ran the following in the /etc/kubernetes/pki directory (where the original CAs remained):

    {
    
    cat > kube-scheduler-csr.json <<EOF
    {
      "CN": "system:kube-scheduler",
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "US",
          "L": "Portland",
          "O": "system:kube-scheduler",
          "OU": "Kubernetes The Hard Way",
          "ST": "Oregon"
        }
      ]
    }
    EOF
    
    cfssl gencert \
      -ca=ca.pem \
      -ca-key=ca-key.pem \
      -config=ca-config.json \
      -profile=kubernetes \
      kube-scheduler-csr.json | cfssljson -bare kube-scheduler
    
    }
    

    which generates kube-scheduler-key.pem and kube-scheduler.pem.

    Next, I needed to generate the new config file using the instructions here.

    I ran:

    {
      kubectl config set-cluster kubernetes-the-hard-way \
        --certificate-authority=ca.pem \
        --embed-certs=true \
        --server=https://127.0.0.1:6443 \
        --kubeconfig=kube-scheduler.kubeconfig
    
      kubectl config set-credentials system:kube-scheduler \
        --client-certificate=kube-scheduler.pem \
        --client-key=kube-scheduler-key.pem \
        --embed-certs=true \
        --kubeconfig=kube-scheduler.kubeconfig
    
      kubectl config set-context default \
        --cluster=kubernetes-the-hard-way \
        --user=system:kube-scheduler \
        --kubeconfig=kube-scheduler.kubeconfig
    
      kubectl config use-context default --kubeconfig=kube-scheduler.kubeconfig
    }
    

    which generates kube-scheduler.kubeconfig which I renamed and moved to /etc/kubernetes/scheduler.conf.

    It was then just a case of reading the logs from the pod (kubectl logs kube-scheduler-xxxxxxx -n kube-system) which will complain about various things missing from the configuration file.

    These were the 'clusters' and 'contexts' blocks of the YAML which I copied from one of the other configuration files in the same directory (after verifying that they were all identical).

    After copying those into scheduler.conf the errors stopped and everything kicked back into life.