amazon-web-serviceskuberneteskubernetes-helmkube-prometheus-stackkube-state-metrics

Helm kube-prometheus-stack stuck in pending-install


Problem: For some reason, helm release of kube-prometheus-stack is stuck in Pending-install status. What is the correct to install a helm release for this using helm cli?

Details:

Due to Docker registry k8s.gcr.io getting frozen, I had to update the Docker image registry to registry.k8s.io for kube-state-metrics by updating the values.yaml as follows:

kube-state-metrics:
  prometheusScrape: true
  image:
    repository: registry.k8s.io/kube-state-metrics/kube-state-metrics
    tag: v1.9.8
    pullPolicy: Always
  namespaceOverride: ""
  rbac:
    create: true
  podSecurityPolicy:
    enabled: true

After that, when I tried update the helm release for kube-prometheus-stack using same version of 14.9.0, it failed with status Failed for helm release. Upon retrying, it deleted the previous helm release and created a new one. All the components by the new one created successfully but the helm release got stuck in the Pending-install status.

I waited for almost 30 minutes but no success. I also tried deleting helm release, rollbacking helm release, deleting helm release secret but got no success.

What could be the issue? How can I solve it?


Solution

  • Solution: After some investigation, I found that there was a job named kube-prometheus-stack-admission-patch which was failing with BackoffLimitExceeded error. It was some kind of an initializing job. Deleting the job (not pod) fixed the issue and the helm release changed its status to Deployed.

    Error Log in kube-prometheus-stack-admission-patch job:

    W0331 10:58:03.079451       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
    {"level":"info","msg":"patching webhook configurations 'kube-prometheus-stack-admission' mutating=true, validating=true, failurePolicy=Fail","source":"k8s/k8s.go:39","time":"2023-03-31T10:58:03Z"}
    {"err":"the server could not find the requested resource","level":"fatal","msg":"failed getting validating webhook","source":"k8s/k8s.go:48","time":"2023-03-31T10:58:03Z"}