I am looking for best practices to handle server restarts. Specifically, I push stock prices to users using websockets for a day trading simulation web app. I have 10k concurrent users. To ensure a responsive ux, I reconnect to the websocket when the onclose event is fired. As our user base has grown we have had to scale our hardware. In addition to better hardware, we have implemented a random delay before reconnecting. The goal of this is to spread out the influx of handshakes when the server restarts ever night (Continuous Deployment). However some of our users have poor internet (isp and or wifi). Their connection constantly drops. For these users I would prefer they reconnect immediately. Is there a solution for this problem that doesn't have the aforementioned tradeoffs?
The question is calling for a subjective response, here is mine :)
Discriminating a client disconnection and a server shutdown:
This can be achieved by sending a shutdown message over the websocket so that active clients can prepare and reconnect with a random delay. Thus, a client that encounters an onclose
event without a proper shutdown broadcast would be able to reconnect asap. This means that the client application needs to be modified to account for this special shutdown event.
Handle the handshake load: Some web servers can handle incoming connections as an asynchronous parallel event queue, thus at most X connections will be initialized at the same time (in parallel) and others will wait in a queue until their turn comes. This allows to safeguard the server performance and the websocket handshake will thus be automatically delayed based on the true processing capabilities of the server. Of course, this means a change of web server technology and depends on your use-case.