We are seeing a behavior where the performance of the JVM decreases when the load is light. Specifically on multiple runs, in a test environment we are noticing that the latency worsens by around 100% when the rate of order messages pumped into the system is reduced. Some of the background on the issue is below and I would appreciate any help on this.
Simplistically the demo Java trading application being investigated can be thought to have 3 important threads: order receiver thread, processor thread, exchange transmitter thread
Order receiver thread receives the order and puts it on a processor q. the processor thread picks it up from the processor q, does some basic processing and puts it on the exchange q. the exchange transmitter thread picks it up from exchange q and sends order to the exchange.
The latency from the order receipt to the order going out to the exchange worsens by 100% when the rate of orders pumped into the system is changed from a higher number to a low number.
Solutions tried:
Warming up critical code path in JVM by sending high message rate and priming the system before reducing message rate: Does not solve the issue
Profiling the application: Using a profiler it shows hotspots in the code where 10 -15% improvement may be had by improving the implementation. But nothing in the range of 100% improvement just obtained by increasing message rate.
Does anyone have any insights/suggestions on this? Could it have to do with the scheduling jitter on the thread.
Could it be that under the low message rate the threads are being switched out from the core?
2 posts I think may be related are below. However our symptoms are a bit different:
Consistent latency for low/medium load requires specific tuning of Linux.
Below are few point from my old check list, which is relevant for components with millisecond latency requirements.
isolcpus
to exclude dedicated cores from schedulertaskset
to bind critical thread to specific corenumactl
)Linux scheduler and power sampling are key contributor to high variance of latency under low/medium low.
By default, CPU core would reduce frequency if inactive, as consequence your next request is processed slower on downclocked core.
CPU cache is key performance asset if your critical thread is scheduled on different cores it would lose its cache data. Also, other threads schedule for same core would evict cache also increasing latency of critical code.
Under heavy load these factors are less important (frequency is maxed and thread are ~100% busy tending to stick to specific cores).
Though under low/medium load these factors negatively affect both average latency and high percentiles (99 percentile may be order of magnitude worse compared to heavy load case).
For high throughput applications (above 100k request/sec) advanced inter thread communication approach (e.g. LMAX disruptor) are also useful.