kuberneteskubernetes-helmbitbucket-pipelineshpa

Helm tries to deploy HPA, despite it being explicitly disabled


I'm currently making some changes in our Bitbucket pipelines for our microservices. In particular, the changes regard the use of a custom image we built as a build image for Helm step. In this step, what happens is:

The code for the step is the following.

- step:
        name: Deploy to Kubernetes
        size: 2x
        image:
          name: <TEST_REGISTRY>/<CUSTOM_IMAGE>:<TAG>
          username: <USER>
          password: <PASSWORD>
        script:
          - export REGISTRY=<TEST_REGISTRY>
          - export IMAGE_NAME=<MS_NAME>
          - export TAG=<TAG>
          - echo <TEST_KUBECONFIG> | base64 -d > <KUBE_CONFIG>
          - export KUBECONFIG=<KUBE_CONFIG>
          - chmod 700 <KUBE_CONFIG>
          - kubectl get pods -n <NAMESPACE>
          - helm version
          - helm repo add <REPO_NAME> <REPO_URL>
          - helm repo update
          - helm search repo <REPO_NAME>
          - helm -n <NAMESPACE> upgrade <MS_NAME> <REPO_NAME>/<MS_NAME> --set env=STAG --set image.registry=$REGISTRY --set image.repository=$IMAGE_NAME --set image.tag=$TAG --set **autoscaling.enabled=false** --set autoscaling.min=1 --set autoscaling.max=6 --set replicaCount=1 --wait --debug

You may notice in bold in helm upgrade the flag autoscaling.enabled set to false. This because we don't want to deploy its HPA yet, so we wrapped the HPA template with the guard {{- if .Values.autoscaling.enabled }}. So, if autoscaling.enabled is set to false, no attempt to deploy the HPA will occur.

This trick worked perfectly fine for all the microservices (we don't want to deploy their HPA yet for these microservices too), except for the last one, which pipeline failed with the following error:

Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "<MS_NAME>" namespace: "" from "": no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta2"

It's well spot, as the HPA is currently defined in its manifest with apiVersion: autoscaling/v2beta2 and the current version of the Kubernetes cluster does not serve such API.

However, also the other microservices - who faced the same changes on the pipeline - have an HPA with that API, yet the pipeline worked fine, since it didn't try to deploy that resource into the cluster - by using --set autoscaling.enabled=false.

I fail to understand why the changes worked fine for every microservice, except for this one. No other changes were made (same goes for the other microservices), I've checked the names of the values --set in the pipeline and the ones shown in values.yaml and they match, I've also check all the other names (image name of the service to be upgraded, the image registry, tags, ...) and everything is correct. Again, no other changes were made in this ot the other services (no changes in their Charts, their Dockerfile, their application code, only changes are in the Helm step in the pipelines).

I've also added the --debug flag to gather more info and see how the user-specified values are actually set - mainly to see if --set autoscaling.enabled=false works - but since the helm upgrade fails it doesn't print these values and no other notable info is shown apart from the previously mentioned error. We cannot change the Kubernetes cluster version, because it is provided from a 3rd party.

I suppose the error can be fixed by declaring a correct apiVersion for the HPA, but that would not solve the problem, which is helm upgrade trying to deploy that resource despite the fact it has been disabled.

At this point, I got no clue on what could have led to this error, so any suggestion will be appreciated if you encountered a similar issue in the past!


Solution

  • So, with a colleague of mine, we found out what was going on. tl;dr in the end.

    Long answer

    What we attempted first was to update the apiVersion of the HPA. The error mentions that it is not possible to deploy HPAs in autoscaling/v2beta2, so we updated the version to autoscaling/v2. Unfortunately, it did not work, again making the pipeline fail with the same error.

    After that, I performed a helm list and noticed that the Chart for the microservice that was giving us some issues was not updated with the pipelines performed these last days - the UPDATED field was set to a date which was before the update of the Kubernetes cluster.

    I found it weird. I had pipelines that failed in helm upgrade in the past, yet the deployed Chart in our namespace was updated - could tell with a quick check using helm list.

    While looking for a solution, I found this question in StackOverflow, in which the author was facing our same issue. As explained in the linked question, Helm keeps track of its Release state with a Secret, inside of which are memorised information regarding the resource types to be deployed inside the cluster.

    What happened is that the Kubernetes cluster was upgraded before we updated the Charts, that contained a no longer served apiVersion. This led to the Release state Secret to not be updated as needed (so, to have HPAs in autoscaling/v2 instead of autoscaling/v2beta2) and that caused the helm upgrade to fail.

    Inside the same question, there is also a link to an official guide provided by Helm, that explains what to do in order to update deprecated Kubernetes APIs. The guide provides a step-by-step procedure on how to update the Release state Secret as needed and was exactly what we needed. In fact, it worked for us - updating the Release state Secret and then performing one last helm upgrade made it possible to deploy our microservice via the defined pipeline.

    tl;dr

    Because the Kubernetes cluster was upgraded before we updated the Charts (which contained no longer served apiVersion in the new Kubernetes version), the Release state Secret corresponding to the failing microservice still had in its manifest that HPAs to be deployed will be in autoscaling/v2beta.

    This caused the failure of helm upgrade in our pipeline.

    We found this guide from Helm to manually update said Release state Secret, which worked for us. It updated the Release Secret and now helm upgrade no longer fails.