Regarding this statement on a blog about Databricks SQL
Throughput vs latency trade off
Throughput vs latency is the classic tradeoff in computer systems, meaning that a system cannot get high throughput and low latency simultaneously. If a design favors throughput (e.g. by batching data), it would have to sacrifice latency. In the context of data systems, this means a system cannot process large queries and small queries efficiently at the same time.
Does not low latency mean high throughput by definition? Why are they suggesting that low latency provides low throughput?
If ThroughPut refers to the count of requests fulfilled in the given time and latency refers time to serve a single request, then surely less time per request means we can serve more requests in the same time frame.
For instance, if latency is 1 second per request, then the server can process 10 requests in 10 seconds.
If latency is reduced to 0.5 second per request, then server's throughput is 20 requests in 10 seconds.
Shouldn't low latency mean high throughput by this definition?
You are correct, as a general concept, a low latency system will take a shorter amount of time to process a single operation and therefore could process more messages than the same system that exhibits a longer latency.
But in practice, especially in programming, latency of a system can be affected by the throughput. We may need to allow for resources to be cleaned up and to become ready again between cycles, some of these resources may be databases that enforce throttling limits or other processes that themselves have safe operating limits. At some point we will often hit limitations with a given processing model that will force us to change our process.
If we scale out our operator processors over more resources you may observe a significant rise in the cost of processing per message, even then you may still run into maximal throughput issues.
In these systems it is common to observe a pattern of latency increasing as the throughput requirements increases. In these systems low latency can only be affordably observed at low throughput rates.
IoT and realtime processing is a common domain where we may need to achieve a higher throughput than our low latency system can achieve, often this is realized by implementing batch processing.
Batch processing is generally a significantly higher latency than most per message flows, but overall it can allow for processing of a higher volume of messages using less resources.
In a batching system we can tune the throughput by altering the size of the batch, more messages in the batch will mean that those messages will have to wait longer to be processed, so this increases latency, but overall larger batch sizes may increase total throughput.
It's is this batch scenario that this dialog of low latency = low throughput generally comes from. It is alluded to in this clip: https://www.youtube.com/watch?v=PXHLZGp-XMc
It is not that low latency systems can only produce low throughput, but more specifically that low throughput systems can more easily achieve lower latencies.