javaperformancejmhjava-21

Why acquiring a locked lock is slower when using JDK21 compared to JDK11


While optimizing some locking stuff, I used a JMH benchmark to see how much does locking a locked ReentrantLock costs compared to just locking it once. I was surprised when I saw that jdk11 performed better than jdk21..It would be really nice to understand why and whether my benchmark correct after all.

I also added benchmark with synchronised block and without any locking at all. As expected, synchronised block is optimized and performs almost as the lock-free one and there is no degradation between different jdk versions.

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class LockNoLockBenchmark {
  int counter;

  ReentrantLock lock = new ReentrantLock();


  @Benchmark
  public void noLock() {
    ++counter;
  }

  @Benchmark
  public void syncLock() {
    synchronized (new Object()) {
      ++counter;
    }
  }

  @Benchmark
  public void lockUnlock() {
    lock.lock();
    try {
      ++counter;
    } finally {
      lock.unlock();
    }
  }

  @Benchmark
  public void lockLockUnlockUnlock() {
    lock.lock();
    try {
      lock.lock();
      try {
        ++counter;
      } finally {
        lock.unlock();
      }
    } finally {
      lock.unlock();
    }
  }
}

Run on Intel Rocket Lake (Core i9) 12th Gen Intel(R) Core(TM) i9-12950HX 12 cores 64Gb RAM

  1. JDK 21
openjdk 21.0.2 2024-01-16
OpenJDK Runtime Environment (build 21.0.2+13-58)
OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)

Benchmark                                 Mode  Cnt   Score   Error  Units
LockNoLockBenchmark.lockLockUnlockUnlock  avgt   10  27.457 ± 0.876  ns/op
LockNoLockBenchmark.lockUnlock            avgt   10  11.409 ± 0.256  ns/op
LockNoLockBenchmark.noLock                avgt   10   0.280 ± 0.010  ns/op
LockNoLockBenchmark.syncLock              avgt   10   0.280 ± 0.008  ns/op
  1. JDK 11
openjdk 11.0.21 2023-10-17
OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

Benchmark                                 Mode  Cnt   Score   Error  Units
LockNoLockBenchmark.lockLockUnlockUnlock  avgt   10  22.414 ± 1.366  ns/op
LockNoLockBenchmark.lockUnlock            avgt   10  11.690 ± 0.407  ns/op
LockNoLockBenchmark.noLock                avgt   10   0.283 ± 0.021  ns/op
LockNoLockBenchmark.syncLock              avgt   10   0.289 ± 0.012  ns/op

I'd expect no degradation in performance for this case with JDK21. I am also interested what are some ways to optimize the code when I need to acquire a locked lock. Thank you.


Solution

  • In JDK 14, there was a massive rewrite of java.util.concurrent internals in the context of JDK-8229442. The goal was to improve overall performance of concurrent primitives and prepare the implementation for virtual threads.

    However, as it often happens, improvements in one scenario are accompanied by a regression in another.

    In JDK 11, the code for recursive locking looks as follows. It has a fast path for checking if the lock is owned by the current thread. Note that there is no atomic compareAndSet operation on this path.

    final boolean nonfairTryAcquire(int acquires) {
        final Thread current = Thread.currentThread();
        int c = getState();
        if (c == 0) {
            if (compareAndSetState(0, acquires)) {
                setExclusiveOwnerThread(current);
                return true;
            }
        }
        else if (current == getExclusiveOwnerThread()) {
            int nextc = c + acquires;
            if (nextc < 0) // overflow
                throw new Error("Maximum lock count exceeded");
            setState(nextc);
            return true;
        }
        return false;
    }
    

    In JDK 21, the code looks a bit differently. initialTryLock always executes compareAndSetState, whether the lock is recursive or not, and that's where the performance difference comes from.

    final boolean initialTryLock() {
        Thread current = Thread.currentThread();
        if (compareAndSetState(0, 1)) { // first attempt is unguarded
            setExclusiveOwnerThread(current);
            return true;
        } else if (getExclusiveOwnerThread() == current) {
            int c = getState() + 1;
            if (c < 0) // overflow
                throw new Error("Maximum lock count exceeded");
            setState(c);
            return true;
        } else
            return false;
    }
    

    The aforementioned refactoring already caused a performance regression earlier, which was later fixed. If your question arose from a real issue in production, you're welcome to submit a bug report.

    As a side notice, your syncLock() benchmark does not actually measure performance of synchronized, since locking on a local non-escaped object is a no-op, and JIT compiler happily eliminates unnecessary locking altogether.