distributed-systemretry-logicexponential-backoff

How Client Side Timeout helps with Server resource exhaustion


I was going through following document published by Amazon regarding "Timeout, Retry and Jitter": https://d1.awsstatic.com/builderslibrary/pdfs/timeouts-retries-and-backoff-with-jitter.pdf
While reading the document, I found the following text

When a client is waiting longer than usual for a request to complete, it also holds on to the resources it was using for that request for a longer time. When a number of requests hold on to resources for a long time, the server can run out of those resources. These resources can include memory, threads, connections, ephemeral ports, or anything else that is limited. To avoid this situation, clients settimeouts. Timeouts are the maximum amount of time that a client waits for a request to complete.

Would like to understand how timeout on client side could stop processing for that request on the server and free server's resources.


Solution

  • There might be a typo in the article where the writer may have wanted to refer to client instead of server. Indeed, the paragraph should build on the first sentence highlighting the client holding on resources while waiting for requests.

    Still though, a client timeout will benefit both client and server achieving resiliency and preparing for subtle activity changes or spikes.

    Client

    Whether written using a low level library or a a high-level abstraction, a client request should almost embrace a timeout to reduce downstream dependency. A client should (and will) be short on resources as mentioned in the article that it can use.

    As an example of these resources (and the most common one) client libraries leverage a pool of connections to avoid expensive connections creation again and again overtime when a client is hitting the same server. When a connection is leased from the pool, then used without a timeout, the client can end up with a connection never returning back to the pool. Given that any downstream dependency can fail to respond (it is fair to always expect it to fail), a client can end up with an empty pool as all connections have been already borrowed by non-responding requests.

    Server

    On the other land, and while it is still out of its control, a server can benefit from a cooperative client that helps terminating a request (note that the connection would still be open) on its end so the server computation and memory resources are freed up.

    Imagine a long running action initiated by a client request. A server short on memory can (given a proper implementation) abort the request in the middle of its execution when receiving an upstream timeout leaving it with computation cycles to spend on housekeeping such as kicking garbage collection which will help it get back to speed.