I know Java has yet to add tail-call elimination optimization and intends to do so as a late stage addition to Project Loom. My question is instead: does the JIT ever optimize away recursive method calls in their entirety and convert them into an iterative form? It seems nominally possible but relatively difficult so I'm guessing if they did it would be strongly described in some doc, but I'm struggling to track anything on the subject down.
As a follow-up, if the JIT does eliminate recursive calls in some form beyond what's described in Loom, how does that appear on stack traces?
Your question “how does that appear on stack traces?” is one of the unsolved problems of tail call optimization in a JVM. There’s an expectation of manageability of the Java code, especially when code is spending a long time in a recursive algorithm (or even hanging in an endless recursion) and the developer wants to find out what’s going on.
So in short, the JVM (current version of HotSpot) has no tail call optimization. But it is capable of inlining a finite number of recursive invocations, just like it can inline other invocations.
When we use
public class Recursion {
public static void main(String[] args) {
runs: for(int run = 0; run < 10; run++) {
for(int i = 1; i < 1_000_000_000; i++) try {
int counted = recursiveCounter(i, 0);
if(i != counted) throw new AssertionError();
} catch(StackOverflowError e) {
System.out.println("limit reached at " + i);
continue runs;
}
}
}
private static int recursiveCounter(int limit, int count) {
return limit == 0? count: recursiveCounter(limit - 1, count + 1);
}
}
we get values similar to those discussed in Why is the max recursion depth I can reach non-deterministic? Running with -XX:CompileCommand=print,Recursion.recursiveCounter
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
prints
@ 18 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) recursive inlining too deep
and
0x0000026398871220: mov dword ptr [rsp+0ffffffffffff9000h],eax
0x0000026398871227: push rbp
0x0000026398871228: sub rsp,10h ;*synchronization entry
; - Recursion::recursiveCounter@-1 (line 16)
0x000002639887122c: test edx,edx
0x000002639887122e: je 26398871254h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
0x0000026398871230: cmp edx,1h
0x0000026398871233: je 26398871259h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871235: add r8d,2h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871239: add edx,0fffffffeh ;*isub {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@10 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x000002639887123c: nop
0x000002639887123f: call 26398871220h ; ImmutableOopMap {}
;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {static_call}
0x0000026398871244: add rsp,10h
0x0000026398871248: pop rbp
0x0000026398871249: mov r10,qword ptr [r15+110h]
0x0000026398871250: test dword ptr [r10],eax ; {poll_return}
0x0000026398871253: ret
0x0000026398871254: mov eax,r8d
0x0000026398871257: jmp 26398871244h
0x0000026398871259: mov eax,r8d
0x000002639887125c: inc eax ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
0x000002639887125e: jmp 26398871244h ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871260: mov rdx,rax
0x0000026398871263: add rsp,10h
0x0000026398871267: pop rbp
0x0000026398871268: jmp 26390ea4d00h ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@17 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {runtime_call _rethrow_Java}
The interesting part is the test edx,edx
instruction matching our limit == 0
condition. If this condition is not fulfilled, there’s a cmp edx,1h
test following, effectively testing what the next recursive invocation would test and if this condition also isn’t fulfilled, the code will perform add r8d,2h
and add edx,0fffffffeh
, incrementing the count
by two and decrementing the limit
by two.
So we clearly see the effect of inlining one level of recursion and fusing the operations. The inlining diagnostics tells us that a limit has been crossed, so it’s worth to investigate what happens when specifying, e.g. -XX:MaxRecursiveInlineLevel=4
:
@ 18 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) recursive inlining too deep
0x0000025345d31620: mov dword ptr [rsp+0ffffffffffff9000h],eax
0x0000025345d31627: push rbp
0x0000025345d31628: sub rsp,10h ;*synchronization entry
; - Recursion::recursiveCounter@-1 (line 16)
0x0000025345d3162c: test edx,edx
0x0000025345d3162e: je 25345d31660h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
0x0000025345d31630: cmp edx,1h
0x0000025345d31633: je 25345d31665h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31635: cmp edx,2h
0x0000025345d31638: je 25345d3166ch ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3163a: cmp edx,3h
0x0000025345d3163d: je 25345d31674h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3163f: cmp edx,4h
0x0000025345d31642: je 25345d3167ch ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31644: add r8d,5h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31648: add edx,0fffffffbh
0x0000025345d3164b: call 25345d31620h ; ImmutableOopMap {}
;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {static_call}
0x0000025345d31650: add rsp,10h
0x0000025345d31654: pop rbp
0x0000025345d31655: mov r10,qword ptr [r15+110h]
0x0000025345d3165c: test dword ptr [r10],eax ; {poll_return}
0x0000025345d3165f: ret
0x0000025345d31660: mov eax,r8d
0x0000025345d31663: jmp 25345d31650h
0x0000025345d31665: mov eax,r8d
0x0000025345d31668: inc eax ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
0x0000025345d3166a: jmp 25345d31650h
0x0000025345d3166c: mov eax,r8d
0x0000025345d3166f: add eax,2h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31672: jmp 25345d31650h
0x0000025345d31674: mov eax,r8d
0x0000025345d31677: add eax,3h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3167a: jmp 25345d31650h
0x0000025345d3167c: mov eax,r8d
0x0000025345d3167f: add eax,4h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31682: jmp 25345d31650h ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31684: mov rdx,rax
0x0000025345d31687: add rsp,10h
0x0000025345d3168b: pop rbp
0x0000025345d3168c: jmp 2533e364d00h ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@17 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {runtime_call _rethrow_Java}
We can see that now, the compiled code has learned to add up to five at once. The application will also print a much larger limit of nested invocations. The comments behind the instruction show that the meta information about the nested invocations is still present, even if there are no individual instructions associated with an invocation level. This implies that a stack trace representing the whole original call chain could still be constructed.
Of course, this is not the same as actual tail call optimization. Regardless of how high we set the limit, it’s still inlining a finite number of invocation and it’s already recognizable that setting the limit too high would yield unreasonable code size.
So the answer to the literal question “Does the Java JIT ever optimize away recursive method calls?” is “Yes, it does”. But not the entire recursion but a finite number of calls, in other words, not in the form of tail call optimization.