when I change the replicas: x
in my .yaml file I can see GKE autopilot boots pods up/down depending on the value, but what will happen if the load on my deployment gets too big. Will it then autoscale the number of pods and nodes to handle the traffic and then reduce back to the value specified in replicas when the request load is reduced again?
I'm basically asking how does autopilot horizontal autoscaling works? and how do I get a minimum of 2 pod replicas that can horizontally autoscale in autopilot?
GKE autopilot by default will not scale the replicas count beyond what you specified. This is the default behavior of Kubernetes in general.
If you want automatic autoscaling you have to use Horizental Pod Autoscaler (HPA) which is supported in Autopilot
If you deploy HPA to scale up and down your workload, Autopilot will scale up and down the nodes automatically and that's transparent for you as the nodes are managed by Google.