I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected. The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6.
I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred.
The content of my HPA
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: django
spec:
{{ if eq .Values.env "prod" }}
minReplicas: 5
maxReplicas: 35
{{ else if eq .Values.env "staging" }}
minReplicas: 1
maxReplicas: 3
{{ end }}
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: django-app
targetCPUUtilizationPercentage: 35
Does anyone know what the cause of this might be?
This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.
targetCPUUtilizationPercentage
relates to Pod's resources requests
.The targetCPUUtilizationPercentage
configures a percentage based on the required CPU that a pod has specified. On Kubernetes we can't create an HPA
without specifying requests
to CPU. It also makes sense to add some limits, because most of the time we don't want to use all available physical CPU.
Let's assume that this is our requests
:
apiVersion: v1
kind: Pod
metadata:
name: apache
spec:
containers:
- name: apache
image: httpd:alpine
resources:
requests:
cpu: 1000m
And in our targetCPUUtilizationPercentage
inside HPA we specify 75%.
That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.
But if we define our requests
as this:
spec:
containers:
- name: apache
image: httpd:alpine
resources:
requests:
cpu: 500m
Now, 100% of CPU our pod has specified is only 50% of a single core. Fine, so 100% of cpu resource usage from this pod means, on hardware, 50% usage of a single core.
This is indifferent for targetCPUUtilizationPercentage
, if we keep our value of 75%
the HPA will start to work when our single core is about 37.5%
usage, because this is 75% of all CPU this pod
requested.
From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.
With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.
Thus, for services with low cpu resources/limits a larger scale-up time window (scaleUp
settings in HPA) can be ideal.