While optimizing some locking stuff, I used a JMH benchmark to see how much does locking a locked ReentrantLock costs compared to just locking it once. I was surprised when I saw that jdk11 performed better than jdk21..It would be really nice to understand why and whether my benchmark correct after all.
I also added benchmark with synchronised block and without any locking at all. As expected, synchronised block is optimized and performs almost as the lock-free one and there is no degradation between different jdk versions.
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class LockNoLockBenchmark {
int counter;
ReentrantLock lock = new ReentrantLock();
@Benchmark
public void noLock() {
++counter;
}
@Benchmark
public void syncLock() {
synchronized (new Object()) {
++counter;
}
}
@Benchmark
public void lockUnlock() {
lock.lock();
try {
++counter;
} finally {
lock.unlock();
}
}
@Benchmark
public void lockLockUnlockUnlock() {
lock.lock();
try {
lock.lock();
try {
++counter;
} finally {
lock.unlock();
}
} finally {
lock.unlock();
}
}
}
Run on Intel Rocket Lake (Core i9) 12th Gen Intel(R) Core(TM) i9-12950HX 12 cores 64Gb RAM
openjdk 21.0.2 2024-01-16
OpenJDK Runtime Environment (build 21.0.2+13-58)
OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)
Benchmark Mode Cnt Score Error Units
LockNoLockBenchmark.lockLockUnlockUnlock avgt 10 27.457 ± 0.876 ns/op
LockNoLockBenchmark.lockUnlock avgt 10 11.409 ± 0.256 ns/op
LockNoLockBenchmark.noLock avgt 10 0.280 ± 0.010 ns/op
LockNoLockBenchmark.syncLock avgt 10 0.280 ± 0.008 ns/op
openjdk 11.0.21 2023-10-17
OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)
Benchmark Mode Cnt Score Error Units
LockNoLockBenchmark.lockLockUnlockUnlock avgt 10 22.414 ± 1.366 ns/op
LockNoLockBenchmark.lockUnlock avgt 10 11.690 ± 0.407 ns/op
LockNoLockBenchmark.noLock avgt 10 0.283 ± 0.021 ns/op
LockNoLockBenchmark.syncLock avgt 10 0.289 ± 0.012 ns/op
I'd expect no degradation in performance for this case with JDK21. I am also interested what are some ways to optimize the code when I need to acquire a locked lock. Thank you.
In JDK 14, there was a massive rewrite of java.util.concurrent
internals in the context of JDK-8229442. The goal was to improve overall performance of concurrent primitives and prepare the implementation for virtual threads.
However, as it often happens, improvements in one scenario are accompanied by a regression in another.
In JDK 11, the code for recursive locking looks as follows. It has a fast path for checking if the lock is owned by the current thread. Note that there is no atomic compareAndSet
operation on this path.
final boolean nonfairTryAcquire(int acquires) {
final Thread current = Thread.currentThread();
int c = getState();
if (c == 0) {
if (compareAndSetState(0, acquires)) {
setExclusiveOwnerThread(current);
return true;
}
}
else if (current == getExclusiveOwnerThread()) {
int nextc = c + acquires;
if (nextc < 0) // overflow
throw new Error("Maximum lock count exceeded");
setState(nextc);
return true;
}
return false;
}
In JDK 21, the code looks a bit differently. initialTryLock
always executes compareAndSetState
, whether the lock is recursive or not, and that's where the performance difference comes from.
final boolean initialTryLock() {
Thread current = Thread.currentThread();
if (compareAndSetState(0, 1)) { // first attempt is unguarded
setExclusiveOwnerThread(current);
return true;
} else if (getExclusiveOwnerThread() == current) {
int c = getState() + 1;
if (c < 0) // overflow
throw new Error("Maximum lock count exceeded");
setState(c);
return true;
} else
return false;
}
The aforementioned refactoring already caused a performance regression earlier, which was later fixed. If your question arose from a real issue in production, you're welcome to submit a bug report.
As a side notice, your syncLock()
benchmark does not actually measure performance of synchronized
, since locking on a local non-escaped object is a no-op, and JIT compiler happily eliminates unnecessary locking altogether.