Autoscaling helps you to automatically add or remove compute engines based on the load. The prerequisites to autoscaling in GCP are instance template and managed instance group.
This question is a part of another question's answer, which is about building an autoscaled and load-balanced backend.
I have written the below answer that contains the steps to set up autoscaling in GCP.
Autoscaling is a feature of managed instance group in GCP. This helps to handle very high traffic by scaling up the instances and at the same time it also scales down the instances when there is no traffic, which saves a lot of money.
To set up autoscaling, we need the following:
Instance template is a blueprint that defines the machine-type, image, disks of the homogeneous instances that will be running in the autoscaled, managed instance group. I have written the steps for setting up an instance template here.
Managed instance group helps in keeping a group of homogeneous instances that is based on a single instance template. Assuming the instance template as sample-template. This can be set up by running the following command in gcloud:
gcloud compute instance-groups managed \
create autoscale-managed-instance-group \
--base-instance-name autoscaled-instance \
--size 3 \
--template sample-template \
--region asia-northeast1
The above command creates a managed instance group containing 3 compute engines located in three different zones in asia-northeast1 region, based on the sample-template.
Autoscaling policy determines the autoscaler behaviour. The autoscaler aggregates data from the instances and compares it with the desired capacity as specified in the policy and determines the action to be taken. There are many auto-scaling policies like:
Average CPU Utilization
HTTP load balancing serving capacity (requests / second)
Stackdriver standard and custom metrics
and many more
Now, Introducing Autoscaling to this managed instance group by running the following command in gcloud:
gcloud compute instance-groups managed \
set-autoscaling \
autoscale-managed-instance-group \
--max-num-replicas 6 \
--min-num-replicas 2 \
--target-cpu-utilization 0.60 \
--cool-down-period 120 \
--region asia-northeast1
The above command sets up an autoscaler based on CPU utilization ranging from 2 (in case of no traffic) to 6 (in case of heavy traffic).
Best Practices: From my perspective, it is better to create a custom image with all your software installed than to use a startup script. As the time taken to launch new instances in the autoscaling group should be as minimum as possible. This will increase the speed at which you scale your web app.
This is part 2 of 3-part series about building an autoscaled and load-balanced backend.