While Kubernetes runtime can give a pod more CPU than 'request' up to 'limit', there's no guarantee it'll happen due to the node CPU condition at that moment. Maybe all CPU is in use by other pods and processes.
If I increase CPU 'request', then it has two cons:
1- Pod deploy can be more challenging since a node with sufficient CPU should exist, so, deploy can take time or fail
2- I reserve and allocate an unnecessary amount of CPU to a pod that needs just a little CPU to start and work and might/will need much more during high load / peak time.
If I set a small CPU 'request', then there's no guarantee if Kubernetes can give enough needed CPU to the pod when needed later on during peak time and high load.
And a side concern is also the CPU usage alerts I want to set in my monitoring dashboards.
If I watch and set alert for CPU usage percentage of 'request', it's pointless because K8s normally gives the pod what it wants at peak time.
If I don't, and watch CPU usage percentage of 'limit' instead, then maybe K8s can't give enough CPU and pod is in trouble most of the time without me knowing it.
I don't know what to do. Not sure if I understood how it works or I'm totally mistaken.
The CPU is a compressible resource, that means it can be throttled in a safe way, if the node is stressed, it will start throttling the CPU used by the containers, it won't kill them. If there is enough CPU in the node, it will start sharing the CPU left according to the reserved CPU of the nodes used as weight, the more reserved the bigger share of the CPU unused.
If you have a consistent behavior of your app, you should be able to adjust the request / limit numbers to a pretty good estimate, that should limit the throttling per node, but it won't guarantee your containers won't get throttled unless you set request==limit; then you are in a bad situation during the valley hours.
A good solution for peak utilization is cluster autoscaling, it will look for problems in pods that cannot run due to resource limits and adjust the size of the cluster if needed, both up and down: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
This is the link to the relevant design proposals around QoS if you want to know more: