What are examples of typical workloads where MRI outperforms JRuby?

I have a Ruby webservice of which I recently checked whether using JRuby (9.1.17.0, OpenJDK 1.8) would improve the performance relative to the current use of MRI (2.5.0). I expected that might be the case, because the performance bottleneck is the large amount of 'basic arithmetic' that is performed to calculate the response data and JRuby tends to outperform MRI on computation-heavy benchmarks.

However, this turns out not to be the case: I've tried many combinations of JRuby/JVM options, but the 'steady state' is 2x slower than MRI. The steady state is achieved after repeating the request ~ 100 times, where the JVM is clearly doing its JIT magic, as the performance improves by a factor 2.5 relative to the initial request.

I would like to understand whether this is expected or unexpected behavior. So I am wondering: what are typical workloads on which JRuby can be expected to be slower than MRI? And is 'basic arithmetic on floats' indeed among them?

(The performance bottleneck is in the same place in MRI and JRuby, determined using appropriate profilers. Originally this post said JRuby was only 20% slower, but I've since introduced an optimization that improved MRI performance by a factor of almost 2, but hardly changed JRuby performance. I suspect the JVM performed the same optimization automatically as it basically amounted to 'constant folding')

Solution

If you are doing computations on Integers, and the Integers fit into native_word_size - 1 bits, then YARV will use native machine arithmetic on Fixnums. If you are doing computations on Floats, are on a 64 bit platform, and your computations fit into 62 bits, YARV will use native FPU arithmetic on flonums. In either case, it doesn't get much faster than that, unless your operations are so trivial that the JVM JIT (or the JRuby compiler) can optimize them away completely, constant fold them, or something similar.

The sweet spot is Integers that are larger than 63 bits but smaller than 64 bits, which are treated as native machine integers by JRuby but not YARV, the same for Floats larger than 62 but smaller than 64 bits. In this range, JRuby will use native operations but YARV won't, which gives JRuby a performance advantage.

In general, YARV outperforms JRuby on latency, particularly startup time. This depends a lot on the JVM used and on the environment, though. There are JVMs that are designed to start up very fast (e.g. IBM J9, which IMO should be the default desktop JVM instead of Oracle HotSpot) or Avian (which is not actually a JVM, as it only implements a subset of the JVM and JRE specs, but nevertheless can run many non-trivial programs that don't use any of the non-implemented features, JRuby being one of those.) Also, there are environments and configurations, that allow you to keep and re-use a JVM and a JRuby instance in memory, eliminating much of the start up time.

The second biggy are YARV C extensions. YARV has a very open and wide API for C extensions. Essentially, YARV C extensions can access pretty much every private internal implementation detail of YARV. (Which obviously means that they can corrupt and crash YARV.) JVM "C extensions" on the other hand, always need to go through a security barrier. They can only corrupt memory that has been explicitly handed to them by the Java code that calls them, they can never corrupt other memory, let alone the JVM itself. However, this comes at a performance cost: calling C from Java or vice versa is generally slower than calling C from YARV and vice versa.

YARV C extensions are even slower than that, since JRuby essentially has to provide an entire complex emulation layer, emulating the internal data structures, functions, and memory layout of YARV in order to get at least some YARV C extensions to run. This is just slow. Period.

Note that this does not apply to Ruby wrappers for C libraries that use the Ruby FFI API. Those don't rely on YARV internals and thus don't need an emulation layer, and JRuby has a quite fast and optimized implementation of the Ruby FFI API. The cost of JVM ↔ C bridging still applies, though.

These are the two big things where YARV is faster: code that runs too short to take advantage of the JVM's optimizations for long-running processes, and code that makes heavy use of calls to and from C, especially YARV C extensions.

If you can get your code to run on TruffleRuby, that would be an interesting experiment. The optimizations TruffleRuby can do are truly amazing (e.g. folding an entire Ruby library using significant amounts of dynamic metaprogramming, reflection, and Hash lookups into a single constant) and it can approach and even beat hand-optimized C. Also, TruffleRuby contains a C interpreter in addition to a Ruby interpreter, and thus can analyze and optimize Ruby code calling into C extensions and vice versa, and even perform cross-language inlining, which means that in some benchmarks, it can execute Ruby code making heavy use of YARV extensions faster than YARV!