amazon-web-serviceskubernetesamazon-eksdaemonset

Kubernetes DaemonSet Pods schedule on all nodes expect one


I'm trying to deploy a Prometheus nodeexporter Daemonset in my AWS EKS K8s cluster.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: prometheus
    chart: prometheus-11.12.1
    component: node-exporter
    heritage: Helm
    release: prometheus
  name: prometheus-node-exporter
  namespace: operations-tools-test
spec:
  selector:
    matchLabels:
      app: prometheus
      component: node-exporter
      release: prometheus
  template:
    metadata:
      labels:
        app: prometheus
        chart: prometheus-11.12.1
        component: node-exporter
        heritage: Helm
        release: prometheus
    spec:
      containers:
      - args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --web.listen-address=:9100
        image: prom/node-exporter:v1.0.1
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: metrics
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host/proc
          name: proc
          readOnly: true
        - mountPath: /host/sys
          name: sys
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      hostPID: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: prometheus-node-exporter
      serviceAccountName: prometheus-node-exporter
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /proc
          type: ""
        name: proc
      - hostPath:
          path: /sys
          type: ""
        name: sys

After deploying however, its not getting deployed on one node.

pod.yml file for that file looks like this:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  generateName: prometheus-node-exporter-
  labels:
    app: prometheus
    chart: prometheus-11.12.1
    component: node-exporter
    heritage: Helm
    pod-template-generation: "1"
    release: prometheus
  name: prometheus-node-exporter-xxxxx
  namespace: operations-tools-test
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: prometheus-node-exporter
  resourceVersion: "51496903"
  selfLink: /api/v1/namespaces/namespace-x/pods/prometheus-node-exporter-xxxxx
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - ip-xxx-xx-xxx-xxx.ec2.internal
  containers:
  - args:
    - --path.procfs=/host/proc
    - --path.sysfs=/host/sys
    - --web.listen-address=:9100
    image: prom/node-exporter:v1.0.1
    imagePullPolicy: IfNotPresent
    name: prometheus-node-exporter
    ports:
    - containerPort: 9100
      hostPort: 9100
      name: metrics
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /host/proc
      name: proc
      readOnly: true
    - mountPath: /host/sys
      name: sys
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: prometheus-node-exporter-token-xxxx
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  hostPID: true
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: prometheus-node-exporter
  serviceAccountName: prometheus-node-exporter
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /proc
      type: ""
    name: proc
  - hostPath:
      path: /sys
      type: ""
    name: sys
  - name: prometheus-node-exporter-token-xxxxx
    secret:
      defaultMode: 420
      secretName: prometheus-node-exporter-token-xxxxx
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-11-06T23:56:47Z"
    message: '0/4 nodes are available: 2 node(s) didn''t have free ports for the requested
      pod ports, 3 Insufficient pods, 3 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

As seen above POD nodeAffinity looks up metadata.name which matches exactly what I have as a label in my node.

But when I run the below command,

 kubectl describe  po prometheus-node-exporter-xxxxx

I get in the events:

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  60m                   default-scheduler  0/4 nodes are available: 1 Insufficient pods, 3 node(s) didn't match node selector.
  Warning  FailedScheduling  4m46s (x37 over 58m)  default-scheduler  0/4 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 Insufficient pods, 3 node(s) didn't match node selector.

I have also checked Cloud-watch logs for Scheduler and I don't see any logs for my failed pod.

The Node has ample resources left

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests    Limits
  --------                    --------    ------
  cpu                         520m (26%)  210m (10%)
  memory                      386Mi (4%)  486Mi (6%)

I don't see a reason why it should not schedule a pod. Can anyone help me with this?

TIA


Solution

  • As posted in the comments:

    Please add to the question the steps that you followed (editing any values in the Helm chart etc). Also please check if the nodes are not over the limit of pods that can be scheduled on it. Here you can find the link for more reference: LINK.

    no processes occupying 9100 on the given node. @DawidKruk The POD limit was reached. Thanks! I expected them to give me some error regarding that rather than vague node selector property not matching


    Not really sure why the following messages were displayed:

    The issue that Pods couldn't be scheduled on the nodes (Pending state) was connected with the Insufficient pods message in the $ kubectl get events command.

    Above message is displayed when the nodes reached their maximum capacity of pods (example: node1 can schedule maximum of 30 pods).


    More on the Insufficient Pods can be found in this github issue comment:

    That's true. That's because the CNI implementation on EKS. Max pods number is limited by the network interfaces attached to instance multiplied by the number of ips per ENI - which varies depending on the size of instance. It's apparent for small instances, this number can be quite a low number.

    Docs.aws.amazon.com: AWSEC2: User Guide: Using ENI: Available IP per ENI

    -- Github.com: Kubernetes: Autoscaler: Issue 1576: Comment 454100551


    Additional resources: