kuberneteskubernetes-dashboardkubernetes-metrics

metrics-server pod should run on master node(s) or worker node(s)?


I am new to k8s, I am trying to deploy dashboard on Master node(s) and part of the deployment is to launch the metrics-server. Full documentation can be found here (dashboard/metrics-server).

My question is related to the warning that we can see immediately after deployment:

$ kubectl describe pods -n kube-system metrics-server-74d7f54fdc-psz5p
Name:           metrics-server-74d7f54fdc-psz5p
Namespace:      kube-system
Priority:       0
Node:           <none>
Labels:         k8s-app=metrics-server
                pod-template-hash=74d7f54fdc
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/metrics-server-74d7f54fdc
Containers:
  metrics-server:
    Image:      my.repo.net/k8s.gcr.io/metrics-server-amd64:v0.3.6
    Port:       4443/TCP
    Host Port:  0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=4443
    Environment:  <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-d47dm (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  metrics-server-token-d47dm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  metrics-server-token-d47dm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/arch=amd64
                 kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  116s (x49 over 66m)  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

After reading other questions e.g. Node had taints that the pod didn't tolerate error when deploying to Kubernetes cluster and 1 node(s) had taints that the pod didn't tolerate in kubernetes cluster I can understand why this problem occurs, but I am confused as to if we should add our self this torelation on the image e.g. (https://github.com/kubernetes-sigs/metrics-server/releases/tag/v0.3.7):

tolerations:
  - key: "example-key"
    operator: "Exists"
    effect: "NoSchedule"

If the Master node should be able to collect metrics on him self this parameter should not be added by default? If not then we should deploy the UI on all workers (this does not make any sense).

Maybe someone with more experience on this can share some light?


Solution

  • Metric server can be deployed into worker node it is not mandatory to be deployed in master node to fetch metric about master server. metric server uses kubeapi-server to fetch the various metrics about the cluster, the requirements for metric server are:

    1. must be reachable from kubeapi-server
    2. Kubelet authorization set properly refer this link

    Do you have worker node in your cluster? Is their any taint applied to those nodes? Also as per your deployment yaml node selector has been configured with below values, please make sure that your worker nodes has got these 2 labels

    You can add the labels to the node (if not present) using below command.

    kubectl label nodes *node-name* kubernetes.io/arch=amd64