Recently I wanted to see what is the performance difference invoking a method on an object directly, vs invoking the same method on the same object, If that objected is added in a single element ArrayList and we try to loop that one element List.
To be honest my assumption was that there would be small difference, as i would expect the single element loop to be unrolled, hence the invocation to mimic a direct invocation.
To test that I created the below simple JMH example: https://gist.github.com/nikkatsa/69a58c63c1b79b245d700f6465bce7f7
To my surprise invoking the method directly on the element seems to be ~2,5x faster. Which is a significant number, as my expectation totally different:
Benchmark Mode Cnt Score Error Units
SingleElementLoopBenchmark.directInvocation thrpt 10 332576972.176 ± 2740616.755 ops/s
SingleElementLoopBenchmark.singleElementLoopInvocation thrpt 10 130179329.978 ± 1303776.248 ops/s
I tweeted a bit asking for some pointers to understand why that was happenning. I was told that running perf
will give me a clue. So i did run with -perf dtraceasm
and I got the below two outputs, the first for the direct invocation and the second for the looped one:
....[Hottest Region 1]..............................................................................
c2, com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub, version 115 (69 bytes)
0x000000010b73c92c: mov %rbp,%r9
0x000000010b73c92f: movzbl 0x94(%r9),%r10d ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@27 (line 123)
; implicit exception: dispatches to 0x000000010b73ca5e
0x000000010b73c937: test %r10d,%r10d
0x000000010b73c93a: jne 0x000000010b73c9da ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
0x000000010b73c940: mov $0x1,%ebp
0x000000010b73c945: data16 data16 nopw 0x0(%rax,%rax,1) ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@33 (line 124)
9.56% ↗ 0x000000010b73c950: mov 0x40(%rsp),%r10
1.00% │ 0x000000010b73c955: mov 0xc(%r10),%r10d ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@1 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
0.17% │ 0x000000010b73c959: mov 0xc(%r12,%r10,8),%r11d ; implicit exception: dispatches to 0x000000010b73ca12
11.18% │ 0x000000010b73c95e: test %r11d,%r11d
0.00% │ 0x000000010b73c961: je 0x000000010b73c9c9 ;*invokevirtual performAction {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
10.69% │ 0x000000010b73c963: mov %r9,(%rsp)
0.65% │ 0x000000010b73c967: mov 0x38(%rsp),%rsi
0.00% │ 0x000000010b73c96c: mov $0x1,%edx
0.14% │ 0x000000010b73c971: xchg %ax,%ax
10.08% │ 0x000000010b73c973: callq 0x000000010b6c2900 ; ImmutableOopMap{[48]=Oop [56]=Oop [64]=Oop [0]=Oop }
│ ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
│ ; {optimized virtual_call}
1.44% │ 0x000000010b73c978: mov (%rsp),%r9
0.19% │ 0x000000010b73c97c: movzbl 0x94(%r9),%r8d ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
9.77% │ 0x000000010b73c984: mov 0x108(%r15),%r10
0.99% │ 0x000000010b73c98b: add $0x1,%rbp ; ImmutableOopMap{r9=Oop [48]=Oop [56]=Oop [64]=Oop }
│ ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
0.02% │ 0x000000010b73c98f: test %eax,(%r10) ; {poll}
0.28% │ 0x000000010b73c992: test %r8d,%r8d
0.00% ╰ 0x000000010b73c995: je 0x000000010b73c950 ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@33 (line 124)
0x000000010b73c997: movabs $0x10abd7d62,%r10
0x000000010b73c9a1: callq *%r10 ;*invokestatic nanoTime {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@34 (line 124)
0x000000010b73c9a4: mov 0x30(%rsp),%r10
0x000000010b73c9a9: mov %rbp,0x18(%r10) ;*putfield measuredOps {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@49 (line 126)
0x000000010b73c9ad: mov %rax,0x30(%r10) ;*putfield stopTime {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@37 (line 124)
0x000000010b73c9b1: movq $0x0,0x20(%r10) ;*invokevirtual directInvocation {reexecute=0 rethrow=0 return_oop=0}
....................................................................................................
....[Hottest Region 1]..............................................................................
c2, com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub, version 115 (279 bytes)
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@27 (line 123)
; implicit exception: dispatches to 0x000000011153fffe
0x000000011153fa81: test %r10d,%r10d
0x000000011153fa84: jne 0x000000011153fbe4 ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
0x000000011153fa8a: mov 0x58(%rsp),%rdx
0x000000011153fa8f: test %rdx,%rdx
0x000000011153fa92: je 0x000000011153fd5e
0x000000011153fa98: mov $0x1,%ebx
╭ 0x000000011153fa9d: jmp 0x000000011153fad6
0.19% │ ↗ 0x000000011153fa9f: mov 0x58(%rsp),%r13
3.55% │ │ 0x000000011153faa4: mov (%rsp),%rcx
0.09% │ │ 0x000000011153faa8: mov 0x60(%rsp),%rdx
0.22% │ │ 0x000000011153faad: mov 0x50(%rsp),%r11
0.17% │ │ 0x000000011153fab2: mov 0x8(%rsp),%rbx ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
│ │ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ │ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.55% │↗│ 0x000000011153fab7: movzbl 0x94(%r11),%r8d ;*goto {reexecute=0 rethrow=0 return_oop=0}
│││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@35 (line 44)
│││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.16% │││ 0x000000011153fabf: mov 0x108(%r15),%r10
0.28% │││ 0x000000011153fac6: add $0x1,%rbx ; ImmutableOopMap{r11=Oop rcx=Oop rdx=Oop r13=Oop }
│││ ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
0.19% │││ 0x000000011153faca: test %eax,(%r10) ; {poll}
4.00% │││ 0x000000011153facd: test %r8d,%r8d
│││ 0x000000011153fad0: jne 0x000000011153fbe9 ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@33 (line 124)
0.07% ↘││ 0x000000011153fad6: mov 0xc(%rcx),%r8d ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.22% ││ 0x000000011153fada: mov 0x10(%r12,%r8,8),%r10d ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
││ ; implicit exception: dispatches to 0x000000011153ff2a
0.21% ││ 0x000000011153fadf: mov 0x8(%r12,%r10,8),%edi ; implicit exception: dispatches to 0x000000011153ff3e
4.39% ││ 0x000000011153fae4: cmp $0x237565,%edi ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
││ 0x000000011153faea: jne 0x000000011153fc92
0.33% ││ 0x000000011153faf0: lea (%r12,%r10,8),%r9 ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.09% ││ 0x000000011153faf4: mov 0x10(%r9),%r9d
0.14% ││ 0x000000011153faf8: test %r9d,%r9d
╰│ 0x000000011153fafb: jle 0x000000011153fab7 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.98% │ 0x000000011153fafd: lea (%r12,%r8,8),%rdi ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.06% │ 0x000000011153fb01: xor %r9d,%r9d ;*aload_0 {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@15 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.09% │ 0x000000011153fb04: mov 0x8(%r12,%r10,8),%esi ; implicit exception: dispatches to 0x000000011153ff4e
0.06% │ 0x000000011153fb09: cmp $0x237565,%esi ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
0.00% │ 0x000000011153fb0f: jne 0x000000011153fcc2
3.93% │ 0x000000011153fb15: lea (%r12,%r10,8),%rax ;*invokeinterface get {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.06% │ 0x000000011153fb19: mov 0x10(%rax),%r10d ;*getfield size {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.ArrayList::get@2 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.11% │ 0x000000011153fb1d: test %r10d,%r10d
│ 0x000000011153fb20: jl 0x000000011153fcf6 ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.Objects::checkIndex@3 (line 372)
│ ; - java.util.ArrayList::get@5 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.28% │ 0x000000011153fb26: cmp %r10d,%r9d
0.00% │ 0x000000011153fb29: jae 0x000000011153fc1c
3.97% │ 0x000000011153fb2f: mov 0x14(%rax),%r10d ;*getfield elementData {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.ArrayList::elementData@1 (line 442)
│ ; - java.util.ArrayList::get@11 (line 459)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.05% │ 0x000000011153fb33: mov %r9d,%ebp ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.Objects::checkIndex@3 (line 372)
│ ; - java.util.ArrayList::get@5 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.08% │ 0x000000011153fb36: mov 0xc(%r12,%r10,8),%esi ; implicit exception: dispatches to 0x000000011153ff62
1.27% │ 0x000000011153fb3b: cmp %esi,%ebp
│ 0x000000011153fb3d: jae 0x000000011153fc5a
3.94% │ 0x000000011153fb43: shl $0x3,%r10
0.05% │ 0x000000011153fb47: mov 0x10(%r10,%rbp,4),%r9d ;*aaload {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.ArrayList::elementData@5 (line 442)
│ ; - java.util.ArrayList::get@11 (line 459)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
1.71% │ 0x000000011153fb4c: mov 0x8(%r12,%r9,8),%r10d ; implicit exception: dispatches to 0x000000011153ff72
17.85% │ 0x000000011153fb51: cmp $0x237522,%r10d ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Listener')}
0.00% │ 0x000000011153fb58: jne 0x000000011153fef6 ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@25 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.79% │ 0x000000011153fb5e: mov %rdi,0x18(%rsp)
0.02% │ 0x000000011153fb63: mov %r8d,0x10(%rsp)
0.02% │ 0x000000011153fb68: mov %rbx,0x8(%rsp)
0.19% │ 0x000000011153fb6d: mov %r11,0x50(%rsp)
3.95% │ 0x000000011153fb72: mov %rdx,0x60(%rsp)
0.02% │ 0x000000011153fb77: mov %rcx,(%rsp)
0.03% │ 0x000000011153fb7b: mov %r13,0x58(%rsp)
0.36% │ 0x000000011153fb80: mov %rdx,%rsi
3.78% │ 0x000000011153fb83: mov $0x1,%edx
0.01% │ 0x000000011153fb88: vzeroupper
4.05% │ 0x000000011153fb8b: callq 0x00000001114c2900 ; ImmutableOopMap{[80]=Oop [88]=Oop [96]=Oop [0]=Oop [16]=NarrowOop [24]=Oop }
│ ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@29 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
│ ; {optimized virtual_call}
0.98% │ 0x000000011153fb90: mov 0x10(%rsp),%r8d
3.61% │ 0x000000011153fb95: mov 0x10(%r12,%r8,8),%r10d ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.24% │ 0x000000011153fb9a: mov 0x8(%r12,%r10,8),%r9d ; implicit exception: dispatches to 0x000000011153ff9e
0.74% │ 0x000000011153fb9f: inc %ebp ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@32 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.04% │ 0x000000011153fba1: cmp $0x237565,%r9d ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
0.00% │ 0x000000011153fba8: jne 0x000000011153fd36
3.60% │ 0x000000011153fbae: lea (%r12,%r10,8),%r11 ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.11% │ 0x000000011153fbb2: mov 0x10(%r11),%r9d
0.35% │ 0x000000011153fbb6: cmp %r9d,%ebp
╰ 0x000000011153fbb9: jge 0x000000011153fa9f ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0x000000011153fbbf: mov %ebp,%r9d
0x000000011153fbc2: mov 0x58(%rsp),%r13
0x000000011153fbc7: mov (%rsp),%rcx
0x000000011153fbcb: mov 0x60(%rsp),%rdx
0x000000011153fbd0: mov 0x50(%rsp),%r11
0x000000011153fbd5: mov 0x8(%rsp),%rbx
....................................................................................................
I would like some help if possible to understand the above two outputs. From the very little I understand it seems that when looping, the program is paying quite a cost of checking the size of the array list in order to perform the loop and also cost of casting the object to the appropriate type, due to the list generics. I cannot really understand if the loops gets unrolled, but I believe it does?
Would be grateful if someone understand the above output and can throw some pointers to what is happening.
You measure an operation that does virtually nothing vs. an operation that involves at least four dependent memory loads:
plus at least one bounds check. I say "at least", because that's what the best optimizing compiler could probably do. HotSpot, however, performs two bound checks: one in ArrayList.get(), another for elementData array access.
Furthermore, ArrayList holds an array of Objects, so in order to get an instance of Listener
from the list, an object needs to be cast to the target type. That's one more memory load plus the type check.
Look at the benchmark scores from a different angle: a single element list is not 2.5x slower, but rather 3 nanoseconds longer:
Benchmark Mode Cnt Score Error Units
SingleElementLoopBenchmark.directInvocation avgt 5 2,340 ± 0,095 ns/op
SingleElementLoopBenchmark.singleElementLoopInvocation avgt 5 5,283 ± 0,074 ns/op
3 nanoseconds! It's just 6 CPU cycles of a 2 GHz core. The light can travel less than 1 metre in 3 nanoseconds. Did you really expect the overhead of ArrayList less than that? No matter how low the overhead is, if comparing it with no overhead at all, the ratio could be infinitely large.