I'm not compiling anything to native, in other words, I'm not using native-image
from GraalVM. I'm just running the same Java class (same Java bytecode) with GraalVM and then running the same Java class (same Java bytecode) with regular Oracle JVMs.
It does not matter which version of Java I use or platform (I tested in Linux and Mac). GraalVM is always much faster (30x) than any other regular JVM.
It looks like the regular JVMs are not optimizing the method correctly with their JIT. Notice that the method is very simple and small.
Does anyone have any insight of why that's the case and how I could fix that in my regular JVM? The only workaround at the moment is to migrate to GraalVM. It is very easy to compile and run the code below to reproduce the issue. Just compile and then run first with an Oracle JVM and then with any Graal JVM to see the difference.
public class OracleJvm23MathBug {
// simple and small amount of math
// =====> should be optimized/compiled/inlined for sure!
private static final long doSomething(int load, int i) {
long x = 0;
for (int j = 0; j < load; j++) {
long pow = (i % 8) * (i % 16);
if (i % 2 == 0) {
x += pow;
} else {
x -= pow;
}
}
return x;
}
/*
* Execute this with OpenJDK/Zulu/Oracle JVM 23 => average 215 nanoseconds
* Now execute this with Graal23 JVM 23 => average 7 nanoseconds
*
* This bug can be observed in any platform (I tested on Linux and Mac)
*
* $ java -version
* java version "23.0.1" 2024-10-15
* Java(TM) SE Runtime Environment (build 23.0.1+11-39)
* Java HotSpot(TM) 64-Bit Server VM (build 23.0.1+11-39, mixed mode, sharing)
*
* $ java -cp . OracleJvm23MathBug
* Value computed: -550000000000
* Measurements: 10000000| Avg Time: 215 nanos | Min Time: 83 nanos | Max Time: 199750 nanos
*
* $ java -version
* java version "23.0.1" 2024-10-15
* Java(TM) SE Runtime Environment Oracle GraalVM 23.0.1+11.1 (build 23.0.1+11-jvmci-b01)
* Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 23.0.1+11.1 (build 23.0.1+11-jvmci-b01, mixed mode, sharing)
*
* $ java -cp . OracleJvm23MathBug
* Value computed: -550000000000
* Measurements: 10000000| Avg Time: 7 nanos | Min Time: 0 nanos | Max Time: 178625 nanos
*/
public static final void main(String[] args) {
final int iterations = 10_000_000;
final int load = 10_000;
NanoBench bench = new NanoBench();
long computed = 0;
for (int i = 0; i < iterations; i++) {
bench.mark();
computed += doSomething(load, i);
bench.measure();
}
System.out.println("Value computed: " + computed);
bench.printResults();
}
private static class NanoBench {
private int measurements;
private long totalTime, minTime, maxTime, time;
private final StringBuilder sb = new StringBuilder(128);
NanoBench() {
reset();
}
public final void reset() {
totalTime = time = measurements = 0;
maxTime = Long.MIN_VALUE;
minTime = Long.MAX_VALUE;
}
public final void mark() {
time = System.nanoTime();
}
public final void measure() {
long lastNanoTime = System.nanoTime() - time;
totalTime += lastNanoTime;
minTime = lastNanoTime < minTime ? lastNanoTime : minTime;
maxTime = lastNanoTime > maxTime ? lastNanoTime : maxTime;
measurements++;
}
public final void printResults() {
sb.setLength(0);
sb.append("Measurements: ").append(measurements);
sb.append("| Avg Time: ").append((long) (totalTime / (double) measurements)).append(" nanos");
sb.append(" | Min Time: ").append(minTime).append(" nanos");
sb.append(" | Max Time: ").append(maxTime).append(" nanos\n\n");
for (int i = 0; i < sb.length(); i++) System.out.print(sb.charAt(i));
}
}
}
GraalVM is using a brand new JIT implementation written in Java. I wouldn't say it is better or worse than the HotSpot C2 JIT compiler, which by the way is very old and written in C++. As Adam Ruka says in this blog post:
The JIT compiler that is used in most distributions of the JVM is HotSpot. It’s the first Java JIT compiler, and thus its codebase is quite old. It’s also written in C++; and since a JIT compiler is a very complex piece of technology, C++’s unmanaged nature means every potential bug in its code can lead to very serious JVM issues, like runtime crashes, security vulnerabilities, or memory leaks. All of those factors mean that developing HotSpot is so difficult that only a few experts in the world can realistically do it. This slow pace of development means that HotSpot lags behind the current state of the art when it comes to supporting all of the newest optimizations.
But not all is lost; you can use the Graal JIT compiler with the latest Oracle JVM 23:
-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler
If you run your code with that option you will see that it runs very fast.
For the record, here are my results using CoralBench:
Measurements: 10,000,000 | Warm-Up: 0 | Iterations: 10,000,000
Avg Time: 376.860 nanos | Min Time: 125.000 nanos | Max Time: 151.625 micros
75% = [avg: 368.000 nanos, max: 375.000 nanos]
90% = [avg: 369.000 nanos, max: 416.000 nanos]
99% = [avg: 374.000 nanos, max: 500.000 nanos]
99.9% = [avg: 375.000 nanos, max: 583.000 nanos]
99.99% = [avg: 375.000 nanos, max: 5.083 micros]
99.999% = [avg: 376.000 nanos, max: 16.875 micros]
Measurements: 10,000,000 | Warm-Up: 0 | Iterations: 10,000,000
Avg Time: 8.190 nanos | Min Time: 0.000 nano | Max Time: 161.000 micros
75% = [avg: 0.000 nano, max: 0.000 nano]
90% = [avg: 0.000 nano, max: 41.000 nanos]
99% = [avg: 4.000 nanos, max: 42.000 nanos]
99.9% = [avg: 5.000 nanos, max: 42.000 nanos]
99.99% = [avg: 5.000 nanos, max: 21.708 micros]
99.999% = [avg: 7.000 nanos, max: 24.625 micros]
Measurements: 10,000,000 | Warm-Up: 0 | Iterations: 10,000,000
Avg Time: 3.354 micros | Min Time: 3.166 micros | Max Time: 2.390 millis
75% = [avg: 3.286 micros, max: 3.292 micros]
90% = [avg: 3.292 micros, max: 3.334 micros]
99% = [avg: 3.305 micros, max: 4.500 micros]
99.9% = [avg: 3.336 micros, max: 12.458 micros]
99.99% = [avg: 3.347 micros, max: 19.958 micros]
99.999% = [avg: 3.350 micros, max: 106.250 micros]