Warning: the question is a little long, but the part below the separation line is for curiosity only.
Oracle's JDK 7 implementation of AtomicInteger includes the following methods:
public final int addAndGet(int delta) {
for (;;) {
int current = get();
int next = current + delta; // Only difference
if (compareAndSet(current, next))
return next;
}
}
public final int incrementAndGet() {
for (;;) {
int current = get();
int next = current + 1; // Only difference
if (compareAndSet(current, next))
return next;
}
}
It seems clear that the second method could have been written:
public final int incrementAndGet() {
return addAndGet(1);
}
There are several other examples of similar code duplication in that class. I can't think of any reasons to do that but performance considerations (*). And I am pretty sure the authors did some in-depth testing before settling on that design.
Why (or in what circumstances) would the first code perform better than the second?
(*) I could not resist but write a quick micro benchmark. It shows (post-JIT) a systematic gap of 2-4% performance in favour of addAndGet(1)
vs incrementAndGet()
(that is admittedly small, but it is very consistent). I can't really explain that result either to be honest...
Output:
incrementAndGet(): 905
addAndGet(1): 868
incrementAndGet(): 902
addAndGet(1): 863
incrementAndGet(): 891
addAndGet(1): 867
...
Code:
public static void main(String[] args) throws Exception {
final int size = 100_000_000;
long start, end;
AtomicInteger ai;
System.out.println("JVM warmup");
for (int j = 0; j < 10; j++) {
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size / 10; i++) {
ai.addAndGet(1);
}
end = System.nanoTime();
System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size / 10; i++) {
ai.incrementAndGet();
}
end = System.nanoTime();
System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
}
System.out.println("\nStart measuring\n");
for (int j = 0; j < 10; j++) {
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size; i++) {
ai.incrementAndGet();
}
end = System.nanoTime();
System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
start = System.nanoTime();
ai = new AtomicInteger();
for (int i = 0; i < size; i++) {
ai.addAndGet(1);
}
end = System.nanoTime();
System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
}
}
I'll give a new conjecture. If we look into the byte code of AtomicInteger
we will see that the main difference between them is that addAndGet
uses the iload_
instruction, whereas the incrementAndGet
uses iconst_
instruction:
public final int addAndGet(int);
...
4: istore_2
5: iload_2
6: iload_1
7: iadd
public final int incrementAndGet();
...
4: istore_1
5: iload_1
6: iconst_1
7: iadd
It seem, that iconst_
+iadd
translates as INC
instruction, due to iload_
...iadd
as ADD
instruction. This all relates to the commonly known question about ADD 1
vs INC
and so on:
Relative performance of x86 inc vs. add instruction
Is ADD 1 really faster than INC ? x86
This could be the answer, why addAndGet
is slightly faster than incrementAndGet