I am trying to understand how hpa works but I have some concerns:
In case my service is set like this:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
and I configure hpa in this way:
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-service
minReplicas: 3
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Is it preventing my service to reach the limits (500m), right?
Is it better to configure by putting a higher value like 80%?
I have this doubt because with this configuration I see pods scaled to the maximum number even if they are using less cpu than limits:
NAME CPU(cores) MEMORY(bytes)
test-service-76f8b8c894-2f944 189m 283Mi
test-service-76f8b8c894-2ztt6 183m 278Mi
test-service-76f8b8c894-4htzg 117m 233Mi
test-service-76f8b8c894-5hxhv 142m 193Mi
test-service-76f8b8c894-6bzbj 140m 200Mi
test-service-76f8b8c894-6sj5m 149m 261Mi
The amount of CPU used is less than the request configured in the definition of the service.
Moreover, I have seen that it has been discussed here as well but I didn't get the answer. Using Horizontal Pod Autoscaling along with resource requests and limits
Is it preventing my service to reach the limits (500m), right?
No, hpa is not preventing it (althogh resources.limits is). What hpa does is starting new replicas when the average cpu utilization across all pods gets above 50% of requested cpu resources, i.e. above 125m.
Is it better to configure by putting a higher value like 80%?
Can't say, it is application specific.
Horizontal autoscaling is pretty well described in the documentation.