Assume an HPA
setup with targetCPUUtilizationPercentage
is set to 50, and minReplicas
is set to 1.
When there is a single pod running, if the load increases so that the cpu usage goes to 60%, HPA
will spawn a new pod due to average cpu is higher than 50%.
Since there are two pods to handle the same load now, if the load doesn't change, it can be assumed that a single pod would take 30% cpu assuming the load is equally divided between the two pods.
In this situation, would HPA
calculate the average cpu to be 30% and remove one pod? If it does, it looks problematic since the cpu would go to 60% again for the remaining pod to handle the same load, and will cause new pods to come and go repeatedly.
See the autoscaling formula in the Kubernetes HPA documentation:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
Now, you have two cases. In both cases desiredMetricValue
is 50%. In the first case currentReplicas
is 1 and currentMetricValue
is 60%; in the second currentReplicas
is 2 and currentMetricValue
is 30%. Those yield
desiredReplicas_1 = ceil[1 * (60 / 50)] = ceil[1 * 1.2] = ceil[1.2] = 2
desiredReplicas_2 = ceil[2 * (30 / 50)] = ceil[2 * 0.6] = ceil[1.2] = 2
That is, the part of the formula that multiplies by the current number of replicas gives the same result from the HPA formula even after you scale up, and the replica count stays stable.
Another way to see this is to write out the average-value calculation explicitly
currentMetricValue = totalMetricValue / currentReplicas
where totalMetricValue
is the sum of the metric across all replicas. Now the two currentReplicas
will cancel out inside the formula, leaving you with
desiredReplicas = ceil[totalMetricValue / desiredMetricValue]
so the scale-up that adds a new Pod with (so far) zero utilization doesn't affect the target replica count at all.