Recently our team is trying to perform DR proving exercise on the VMs attached to the regional MIG in a GCP project.
We followed google's documentation(https://cloud.google.com/compute/docs/instance-groups/regional-mig-simulate-zonal-outage) for simulating zonal loss for regional MIG using a failure script(in which we are deleting the instance every time it tries to rebuild after boot).
While simulating zonal outage on the VM attached to the regional MIG, the MIG is trying to rebuild the VM in the primary or impacted zone instead of remaining zone. During the actual outage it won't be the case ideally.
VMs have been created using the instance template. Autoscaling and autohealing not configured in the MIG. Target distribution shape is even.
Our regional MIG which is deployed in two zones(europe-west2-b, europe-west2-a) with zone europe-west2-b being the primary zone, then during zonal outage the VM should failover to europe-west2-a zone. However, that's not happening here.
Not sure if there are some other recommendations on DR proving exercise on regional MIGs?
There are multiple issues currently:
The approach taken in https://cloud.google.com/compute/docs/instance-groups/regional-mig-simulate-zonal-outage is to check if you are already overprovisioned today, i.e. if all the VMs in one of the zones are not serving correctly, then your workload will still have enough capacity.