javaperformancejvm

AtomicInteger implementation and code duplication


Warning: the question is a little long, but the part below the separation line is for curiosity only.

Oracle's JDK 7 implementation of AtomicInteger includes the following methods:

public final int addAndGet(int delta) {
    for (;;) {
        int current = get();
        int next = current + delta;         // Only difference
        if (compareAndSet(current, next))
            return next;
    }
}

public final int incrementAndGet() {
    for (;;) {
        int current = get();
        int next = current + 1;             // Only difference
        if (compareAndSet(current, next))
            return next;
    }
}

It seems clear that the second method could have been written:

public final int incrementAndGet() {
    return addAndGet(1);
}

There are several other examples of similar code duplication in that class. I can't think of any reasons to do that but performance considerations (*). And I am pretty sure the authors did some in-depth testing before settling on that design.

Why (or in what circumstances) would the first code perform better than the second?


(*) I could not resist but write a quick micro benchmark. It shows (post-JIT) a systematic gap of 2-4% performance in favour of addAndGet(1) vs incrementAndGet() (that is admittedly small, but it is very consistent). I can't really explain that result either to be honest...

Output:

incrementAndGet(): 905
addAndGet(1): 868
incrementAndGet(): 902
addAndGet(1): 863
incrementAndGet(): 891
addAndGet(1): 867
...

Code:

public static void main(String[] args) throws Exception {
    final int size = 100_000_000;
    long start, end;
    AtomicInteger ai;

    System.out.println("JVM warmup");
    for (int j = 0; j < 10; j++) {
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size / 10; i++) {
            ai.addAndGet(1);
        }
        end = System.nanoTime();
        System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size / 10; i++) {
            ai.incrementAndGet();
        }
        end = System.nanoTime();
        System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
    }


    System.out.println("\nStart measuring\n");

    for (int j = 0; j < 10; j++) {
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size; i++) {
            ai.incrementAndGet();
        }
        end = System.nanoTime();
        System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size; i++) {
            ai.addAndGet(1);
        }
        end = System.nanoTime();
        System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
    }
}

Solution

  • I'll give a new conjecture. If we look into the byte code of AtomicInteger we will see that the main difference between them is that addAndGet uses the iload_ instruction, whereas the incrementAndGet uses iconst_ instruction:

    public final int addAndGet(int);
       ...
       4:   istore_2
       5:   iload_2
       6:   iload_1
       7:   iadd
     
    public final int incrementAndGet();
       ...
       4:   istore_1
       5:   iload_1
       6:   iconst_1
       7:   iadd
    

    It seem, that iconst_+iadd translates as INC instruction, due to iload_...iadd as ADD instruction. This all relates to the commonly known question about ADD 1 vs INC and so on:

    Relative performance of x86 inc vs. add instruction

    Is ADD 1 really faster than INC ? x86

    This could be the answer, why addAndGet is slightly faster than incrementAndGet