kubernetesgoogle-cloud-platformgoogle-kubernetes-enginecpu-usagehpa

HPA Scaling even though Current CPU is below Target CPU


I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected. The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6. enter image description here

I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred. enter image description here

The content of my HPA

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: django
spec:
{{ if eq .Values.env "prod" }}
 minReplicas: 5
 maxReplicas: 35
{{ else if eq .Values.env "staging" }}
 minReplicas: 1
 maxReplicas: 3
{{ end }}
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: django-app
 targetCPUUtilizationPercentage: 35

Does anyone know what the cause of this might be?


Solution

  • This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.

    How targetCPUUtilizationPercentage relates to Pod's resources requests.

    The targetCPUUtilizationPercentage configures a percentage based on the required CPU that a pod has specified. On Kubernetes we can't create an HPA without specifying requests to CPU. It also makes sense to add some limits, because most of the time we don't want to use all available physical CPU.

    Let's assume that this is our requests:

    apiVersion: v1
    kind: Pod
    metadata:
      name: apache
    spec:
      containers:
        - name: apache
          image: httpd:alpine
          resources:
            requests:
              cpu: 1000m
    

    And in our targetCPUUtilizationPercentage inside HPA we specify 75%.

    That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.

    But if we define our requests as this:

    spec:
      containers:
        - name: apache
          image: httpd:alpine
          resources:
            requests:
              cpu: 500m
    

    Now, 100% of CPU our pod has specified is only 50% of a single core. Fine, so 100% of cpu resource usage from this pod means, on hardware, 50% usage of a single core.

    This is indifferent for targetCPUUtilizationPercentage, if we keep our value of 75% the HPA will start to work when our single core is about 37.5% usage, because this is 75% of all CPU this pod requested.

    From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.

    Understanding the scenario in the question above

    With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.

    Thus, for services with low cpu resources/limits a larger scale-up time window (scaleUp settings in HPA) can be ideal.