This is from linux source arch/arm64/kernel/head.S showing the kernel start. The code first calls preserve_boot_args
and next calls el2_setup
using bl
(branch and link). I showed the procedure preserve_boot_args
also.
SYM_CODE_START(primary_entry)
bl preserve_boot_args
bl el2_setup // Drop to EL1, w0=cpu_boot_mode
adrp x23, __PHYS_OFFSET
and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
bl set_cpu_boot_mode_flag
bl __create_page_tables
/*
* The following calls CPU setup code, see arch/arm64/mm/proc.S for
* details.
* On return, the CPU will be ready for the MMU to be turned on and
* the TCR will have been set.
*/
bl __cpu_setup // initialise processor
b __primary_switch
SYM_CODE_END(primary_entry)
SYM_CODE_START_LOCAL(preserve_boot_args)
mov x21, x0 // x21=FDT
adr_l x0, boot_args // record the contents of
stp x21, x1, [x0] // x0 .. x3 at kernel entry
stp x2, x3, [x0, #16]
dmb sy // needed before dc ivac with
// MMU off
mov x1, #0x20 // 4 x 8 bytes
b __inval_dcache_area // tail call
SYM_CODE_END(preserve_boot_args)
As I understand, bl
is for calling procedure (after procedure, return to the address kept in the lr - link register, x30) and b
is just going to the labeled address not returning. But in procedure preserve_boot_args
above, just at the end, there is b __inval_dcache_area
instruction which just goes to __inval_dcache_area
without returning. Then how does it return to the original code (where bl el2_setup
is)? And how does a procedure ends itself? The definition of SYM_CODE_END is like this :
#define SYM_END(name, sym_type) \
.type name sym_type ASM_NL \
.size name, .-name
#endif
I can't understand how this code makes it return to the address in lr
. Shouldn't we do something like mv pc, lr
?
This looks like a call optimization — sometimes called tail call optimization, which is useful in reducing the stack depth on recursion — but also useful in the general case.
The way this optimization works, a caller, A, calls a function, B, which calls another function, C. If B was going to return directly to A after calling C, then B can jump to C instead! Being none the wiser, C returns to its caller, which appears to be A. By doing this, B doesn't need a stack frame and doesn't have to save the link register — it simply passes its return address on to C.
This optimization skips the otherwise normal return of C to B, making C return directly to A. This transformation is only enabled (i.e. correct) under certain circumstances:
From a hardware perspective, when the optimization is used (it is coded in B and then B is invoked), it is as if B & C were merged: Function A calls "BC" if you will. Dynamically, there is one bl
(A->BC) and one ret
(BC->A) — nicely balanced, which is good for the hardware branch predictor's call-stack handling.
We cannot express the tail call optimization in most high level languages, as most have only "call subroutine" and do not have a "jump to subroutine" feature. So, at best, we can write code that does no work upon return as per the above, and let the language/compiler perform the optimization, if it knows the optimization.
In A calls B calls C, B & C are functions, but A may or may not be a function (it could just some assembly code — while it is a caller of B, A itself doesn't need to be invoked or invokable as a function. While the call chain can be deep, the first code at the very top a call chain is not a function (e.g. it is _start
or sometimes main
) and has no where to return to (so doesn't use ret
to exit; it doesn't have a return address parameter provided by a caller). (If code has a place to return, i.e. a return address to use, then by definition it is not the top of the call chain (it is nominally a function).)
This initial code can play the role of A but not of B or C in the pattern. Tail call is precluded for A's call to B when A is not a function because there is no caller of A for B to return to. This is why the pattern must be A calls B calls C, B & C must be functions, and we consider applying the optimization to B. If A is a function, it must have a caller, so then can play the role of the middle function in the pattern (as can C, e.g. if C calls D).