google-cloud-platformgoogle-compute-enginegcloudgoogle-managed-vmgce-instance-group

Delayed sequential restart of Compute Engine VMs in Managed Instance Groups


I have a Managed Instance Group of Google Compute Engine VMs (based on a template with container deployment on Container-Optimized OS). The MIG is regional (multi-zoned).

I can release an updated container image (docker run, docker tag, docker push), and then I'd like to restart all VMs in the MIG one by one, so that they can have the updated container (not sure if there's a simpler/better alternative to refresh the VMs attached container). But I also want to introduce a slight delay (say 60 seconds) between each VM's restart event, so that only one or two VMs are unavailable during their restart.

What are some ways to do this programmatically (either via gcloud CLI or their API)?

I tried a rolling restart of the MIG, with maximum unavailable and minimum wait time flags set:

gcloud beta compute instance-groups managed rolling-action restart MIG_NAME \
    --project="..." --region="..." \
    --max-unavailable=1 --min-ready=60

... but it returns an error:

ERROR: (gcloud.beta.compute.instance-groups.managed.rolling-action.restart) Could not fetch resource:
 - Invalid value for field 'resource.updatePolicy.maxUnavailable.fixed': '1'. Fixed updatePolicy.maxUnavailable for regional managed instance group has to be either 0 or at least equal to the number of zones.

Is there a way to perform one-by-one instance restarts with a slight delay in between each action?


Solution

  • Unfortunately the MIGs don't handle this use-case for regional deployments as at Jan 2023. You can, however, orchestrate the rolling update yourself along (sudo code):

    for (INSTANCE in instances)
      // Force restart the instance
      gcloud compute instance-groups managed update-instances MIG_NAME \
          --project="..." --region="..." \
          --instances=INSTANCE --minimal-action=RESTART \
          --most-disruptive-allowed-action=RESTART
    
      WAIT
    
      if (container on INSTANCE not working correctly)
          // Break and alert the operator