I am using K8s HPA for auto scaling a web socket server (Apollo Graphql Subscriptions)
When it decides to scale down, what happens to active connections on that pod?
Ideally, it would mark a pod for scaling down, and stop sending new connections to it, but wait until all active connections are closed before removal.
Is that what it does, or is there any way to achieve that?
Kubernetes doesn't know anything about "connections". If the HPA decides a Deployment should be scaled in, the HPA controller decreases the Deployment's replicas:
value, and from there the Deployment controller terminates an arbitrary Pod it manages. From there you're up to the normal Pod shutdown sequence: the Pod receives SIGTERM and is removed from its Service, it can try to clean up politely and promptly from this, but it will eventually terminate no matter what it does.
This means, in your environment, that there's some possibility that scale-in will cause a WebSocket connection to be dropped. This isn't the only thing that could cause this – network hiccups and software failures happen – and the client should be able to reconnect if it does. In a GraphQL context, the client would need to re-send the same subscription
request.
If your application can subscribe to signal handlers (and its image isn't built in a way that signals get lost) then it can subscribe to SIGTERM. Kubernetes sends this signal when a Pod is getting deleted, but this is also a normal Unix signal that ^C
and kill(1) can send. Your application can react to the signal by gracefully stopping the WebSocket connections and then exiting itself.
If you don't (or can't) handle the signal, or if your process doesn't exit within 30 seconds (configurable), then Kubernetes will kill the Pod. You'll receive a SIGKILL, but there's little you can do with this. Any outstanding connections will be abruptly closed.