Is there anyone who can explain what the 'optimize-utilization' setting for the GKE autoscaler specifically does different from the standard autoscaling. It claims to be more aggressive in downscaling but does that mean that it doesn't look at the pod disruption budget, does it have a different limit for max resource usage (50% for the standard way) or does it have a 1 minute limit before scaling down instead of the normal 10 minutes? It is all very vague to me and I want to know the consequences before turning it on.
From Cluster Autoscaler Documentation:
optimize-utilization
: Prioritize optimizing utilization over keeping spare resources in the cluster. When enabled, Cluster Autoscaler will scale down the cluster more aggressively: it can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency. We do not currently recommend using this profile with serving workloads.
Promoted Autoscaling Profiles to beta. Use with gcloud beta container clusters create or gcloud container clusters update: --autoscaling-profile=balanced (default) or --autoscaling-profile=optimize-utilization.
At beta, products or features are ready for broader customer testing and use. Betas are often publicly announced. There are no SLAs or technical support obligations in a beta release unless otherwise specified in product terms or the terms of a particular beta program. The average beta phase lasts about six months.
Being recently promoted to beta probably means that it is still being assessed and fine tuned before being released and properly documented.
The official suggestion to use this method only for batch workloads (jobs) not for serving workloads enforces the statement that it is not ready for all environments in production.
I suggest you to follow the recommendations provided and if you are looking to apply it on serving workloads I'd wait a few months before it's promoted to General Availability.
More references: