In my project, I have written my own implementation of the higher levels of the Itanium abi (__cxa_throw, etc). However, when it came to stack unwinding, I decided that this would be too much for me to handle. So I decided to use llvm's libunwind and llvm's cxxabi for the personality function.
For reference, I'm compiling this for a MIPS 1 system with no OS and no fpu. So I'm compiling with the following defines: _LIBUNWIND_IS_NATIVE_ONLY
, _LIBUNWIND_IS_BAREMETAL
, _LIBUNWIND_HAS_NO_THREADS
, _ABIO32 1
, _MIPS_SIM _ABIO32
, __mips_isa_rev 1
and manually undefing __mips_hard_float
.
After adding these libraries into my program and fixing lots of compiler errors I get the following problem. I set my program up to do a simple try and catch, with the catch being a catch(...)
. However, it never reaches this point nor does it call __cxa_begin_catch
. Stepping through, it reaches Registers_mips_o32.jumpto
, but the value that the ra register is set to is 0x00010000
. This is without the compiler flag -fomit-frame-pointer
. Adding that compiler flag, the ra register is instead set to 0xbf800000
.
Neither of these are correct memory locations, as on this system my program and heap/stack all exist within the range 0x80000000
to 0x801fffff
In the source code for libunwind, the following comments and linkerscript are provided
// When statically linked on bare-metal, the symbols for the EH table are looked
// up without going through the dynamic loader.
// The following linker script may be used to produce the necessary sections and symbols.
// Unless the --eh-frame-hdr linker option is provided, the section is not generated
// and does not take space in the output file.
//
// .eh_frame :
// {
// __eh_frame_start = .;
// KEEP(*(.eh_frame))
// __eh_frame_end = .;
// }
//
// .eh_frame_hdr :
// {
// KEEP(*(.eh_frame_hdr))
// }
//
// __eh_frame_hdr_start = SIZEOF(.eh_frame_hdr) > 0 ? ADDR(.eh_frame_hdr) : 0;
// __eh_frame_hdr_end = SIZEOF(.eh_frame_hdr) > 0 ? . : 0;
Adding this to my linkerscript was required for the program to link properly. My understanding of this is that it should generate the eh frames properly
Checking the map file generated from linking, I get the following section
.eh_frame 0x000000008001cdf0 0x2188
0x000000008001cdf0 __eh_frame_start = .
*(.eh_frame)
.eh_frame 0x000000008001cdf0 0x45c test/build/Exception.o
0x4e8 (size before relaxing)
.eh_frame 0x000000008001d24c 0x18c test/build/Memory.o
0x1a0 (size before relaxing)
.eh_frame 0x000000008001d3d8 0x580 test/build/Typeinfo.o
0x594 (size before relaxing)
.eh_frame 0x000000008001d958 0x28 test/build/crt0.o
0x3c (size before relaxing)
.eh_frame 0x000000008001d980 0x280 test/build/main.o
0x2e0 (size before relaxing)
.eh_frame 0x000000008001dc00 0x1f8 test/build/UnwindLevel1.o
0x20c (size before relaxing)
.eh_frame 0x000000008001ddf8 0x0 test/build/UnwindLevel1-gcc-ext.o
0x190 (size before relaxing)
.eh_frame 0x000000008001ddf8 0x1180 test/build/libunwind.o
0x1930 (size before relaxing)
0x000000008001ef78 __eh_frame_end = .
.eh_frame_hdr 0x000000008001ce00 0x0
*(.eh_frame_hdr)
0x0000000000000000 __eh_frame_hdr_start = (SIZEOF (.eh_frame_hdr) > 0x0)?ADDR (.eh_frame_hdr):0x0
0x0000000000000000 __eh_frame_hdr_end = (SIZEOF (.eh_frame_hdr) > 0x0)?.:0x0
From what I can tell, this appears to be properly generated. But I'm not sure
My question then is this; I don't have a strong understanding of libunwind nor of stack unwinding. Is there something obvious here I am missing?
So it turns out this was actually a bug in libunwind! I've submitted a pull request for it here but I'll summarise the problem here.
Libunwind contains the following at the end of libunwind::Registers_mips_o32::jumpto
lw $30, (4 * 30)($4)
// load new pc into ra
lw $31, (4 * 32)($4)
// jump to ra, load a0 in the delay slot
jr $31
The way this is supposed to work is the 2nd line should load the new return address into $ra. The following line then returns to this new address
The reason this doesn't work is because it doesn't take into account the load delay! Load with offset takes an additional cycle to update the register's value. This means that when running the third line, $ra has not yet been updated. So it returns to the wrong address. This can be fixed by either adding a nop, or be moving the instructions around like so
// load new pc into ra
lw $31, (4 * 32)($4)
//allow for load delay, so that ra address is new value when jumping
lw $30, (4 * 30)($4)
// jump to ra, load a0 in the delay slot
jr $31