kubernetesdeploymentk3s

Kubernetes taint on master but no scheduling on worker node


I have an issue on my kubernetes (K3S) cluster :

0/4 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had taint {k3s-controlplane: true}, that the pod didn't tolerate.

To describe how that happened, I have 4 K3S server, with 3 control-plane and 1 worker.

No nodes have taints, so each pod was able to schedule on any node.

I want to change that and taint my master nodes, so I added: Taints: k3s-controlplane=true:NoSchedule on 2 nodes

To test it, I've restarted one deployment, and now, that pod won't schedule.

As I understand, it should schedule on the no tainted nodes by default, but it seems that is not the case.

For new deployment, it works great.

So I guess, there is history in my deployment that crate the issue. The deployment is kind of simple :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: test
    spec:
      nodeSelector:
        type: "slow"      
      containers:
      - env:
        - name: PUID
          value: "1000"
        - name: GUID
          value: "1000"
        - name: TZ
          value: Europe/Paris
        - name: AUTO_UPDATE
          value: "true"
        image: test/test
        imagePullPolicy: Always
        name: test
        volumeMounts:
        - mountPath: /config
          name: vol0
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"
      volumes:
      - name: vol0
        persistentVolumeClaim:
          claimName: test-config-lh

Solution

  • Well, this particular deployment had a selector : "slow" which are the tag for these two node ....

    If i use this command :

    kubectl get nodes --show-labels
    NAME      STATUS   ROLES                       AGE    VERSION        LABELS
    baal-01   Ready    control-plane,etcd,master   276d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=baal-01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=k3s,type=slow
    baal-02   Ready    control-plane,etcd,master   276d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=baal-02,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=k3s,type=slow
    lamia01   Ready    control-plane,etcd,master   187d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=lamia01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=k3s,type=fast
    lamia03   Ready    <none>                      186d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=lamia03,kubernetes.io/os=linux,node.kubernetes.io/instance-type=k3s,ram=full,type=fast
    

    You can notice the label "type=slow" on the two nodes "baal-01" and "baal-02", and thoses two nodes have the no schedule taint.

    So the deployment was trying to shcedule the pods on a node with the label "type=slow" and none of the schedulable node had this label.

    Sorry, i missed it ..

    so no issue there ...