kubernetesprometheuskubernetes-helmprometheus-node-exporterkube-state-metrics

kube-prometheus-stack upgrade failed with errors: "failed calling webhook x509: certificate signed by unknown authority" and "field is immutable"


I just upgraded kube-prometheus-stack using Helm chart on my Kubernetes cluster using Terraform and started seeing the following 2 errors:

Error 1:

failed calling webhook "prometheusrulemutate.monitoring.coreos.com": 
failed to call webhook: 
Post "https://kube-prometheus-stack-operator.infra.svc:443/admission-prometheusrules/mutate?timeout=30s": 
x509: certificate signed by unknown authority

Error 2:

Error: cannot patch "kube-prometheus-stack-prometheus-node-exporter" with kind DaemonSet: 
DaemonSet.apps "kube-prometheus-stack-prometheus-node-exporter" is invalid: spec.selector: 
Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"kube-prometheus-stack", "app.kubernetes.io/name":"prometheus-node-exporter"}, 
MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && 
cannot patch "kube-prometheus-stack-kube-state-metrics" with kind Deployment: 
Deployment.apps "kube-prometheus-stack-kube-state-metrics" is invalid: spec.selector: 
Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"kube-prometheus-stack", "app.kubernetes.io/name":"kube-state-metrics"}, 
MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

You can find the upgrade path to see both older and newer versions of image and chart of Kube Prometheus Stack in the table below:

Component Old Version (Dated: 14 April, 2021) New Version (Dated: 24 July, 2023)
Image v0.46.0 v0.66.0
Chart 14.9.0 48.2.0

How to fix those 2 errors?


Solution

  • Fix for Error 1:

    To fix the 1st error, I changed the prometheusOperator configuration to set failurePolicy to Ignore under admissionWebhooks in the default values file for Helm chart as follows:

    prometheusOperator:
      enabled: true
      admissionWebhooks:
    "Fail"
        failurePolicy: "Ignore"
    

    Fix for Error 2:

    To fix the 2nd error, I disabled both kubeStateMetrics and nodeExporter configuration in the default values file for Helm chart by setting enabled to false and then applied the Helm chart and then enabled both by setting enabled to true and then applied the Helm chart and that worked. Maybe, deletion of resources was required for a successful installation in the newer version. Not sure what caused that. Maybe some incorrect configuration in the default values file during upgrade.

    Reference: kube-prometheus-stack / v48.2.0