google-cloud-platform google-compute-engine load-balancing preemptive

GCP Load Balancer behaviour with terminating preemptive instances

Background

We have a dispatcher instance group that receives around 700 requests per second per active VM. This dispatcher is behind a Load Balancer that auto scales. Thus far all our VMs are regaular VMs, however we have been studying the possibility of making them preemptive.

The problem with preemptive instances

According to the documentation GCP can terminate a preemptive instance at any time.

Let's assume that each dispatcher VM holds no state. It receives a request, processes it and makes an HTTP request to some other machine.

At any given time, each VM will be processing around 700 requests concurrently, while receiving data from the load balancer.

Question

What happens if my preemptive VM, processing 700 requests, receives a signal to be terminated?

Well, in theory one should have a shutdown script that makes sure processing those requests finishes and then kills the app (clean exit). This leads us to the big question:

But does the load balancer know that my VM is shutting down? Will it keep sending requests to the terminating VM?

Considerations

If yes, then it means some requests will fail because once the app shuts down, the machine is still up and the load balancer keeps on sending requests to the machine, not knowing the app is already down.

Ideally, these requests would go back as failed requests to the load balancer and it would send the requests to another machine. However GCP load balancers are not smart enough to do this, and so they don't.

If somehow the load balancer knows this VM was selected for preemtive termination than nothing special needs to be done.

Which one is it?

Solution

But does the load balancer know that my VM is shutting down? Will it keep sending requests to the terminating VM?

Yes, the load balancer will continue to send requests to the instance.

You will need to create a shutdown script and remove your instance from the load balancer.

It is not that the load balancer is not smart enough. The load balancer does not know if your requests can be retried. That decision should be made by the client / backend logic.

Your use case is not a good example for preemptive instances. Preemptive instances will be terminated every 24 hours. If your goal is cost savings, compare the cost of long term instance pricing to preemptive pricing. The savings are not enough to justify the engineering, testing and QA costs.

Architectures should be designed for failure, but I would not deliberately pick an architecture that will fail constantly. In your case every 24 hours. There is also the risk that you will not be able to launch another instance to make up for the increased load. And there is the risk that all your instances will be terminated.