While I am benchmarking my Rocketcore CPU, I encountered failed Coremark benchmarking. After some debug, I reduce the issue scope to unsuccessful global initialization of 0 value. In Coremark, it will initialize some volatile variables to be 0x0, but instead, the variables are assigned wrong values.
Environment Information for issue production:
My simulation flow:
$ make clean
$ make bmarks=test
$ riscv32-unknown-elf-objcopy -O binary test.riscv test.bin
$ hexdump -v -e/'4 "%08x\n" ' test.bin > test.hex
$ chmod 755 test.hex
Expected Behaviour:
By setting PERFORMANCE_RUN=1
, all globally defined seed values should be initialized with predefined values, and locally defined num values should be initialized with predefined values:
seed1_volatile=0x0
seed2_volatile=0x0
seed3_volatile=0x66
seed4_volatile=0xa
seed5_volatile=0
num1=0x0
num2=0x0
num3=0x0
num4=0x0
num5=0x0
Actual behaviour:
Debug output showing the seed values globally initialized as 0 becomes random numbers, while others values are as expected:
seed1_volatile=0xdd232eba
seed2_volatile=0xedd684db
seed3_volatile=0x66
seed4_volatile=0xa
seed5_volatile=0xf870bef0
num1=0x0
num2=0x0
num3=0x0
num4=0x0
num5=0x0
My debug attempts:
%d
, %lu
, %x
, %s
, %f
, output as expected.PERFORMANCE_RUN
flag value, matches the setting.0x80000000
- 0x90000000
)What I found out:
lw a1,offset(a1)
or lw a1,offset(gp)
; while the local variables seems to be down by loading the value as immediate with li a1,0
.At this point, I do not know how to globally initialize a variable as 0. If anyone can help pointing out possible reason or direction for further debug info, it will be much appreciated.Thank you in advance for giving me any hint possible!
[self posting answer]
After two weeks of debug, I finally figure out where is the issue (due to my lack of knowledge in compilation and assembly) - the original crt.S and test.ld provided by the rocket-tools repo common folder do NOT contain the clearing of .bss data section, as the developers expect the bootloader to do the initialization. Other users have actually posted about this issue before - post 1, post 2.
So my solution is to add the clearing code into the crt.S before the trap vector initialization:
# init bss section
la a0, __sbss #load the starting address of bss to a0
la a1, __ebss #load the ending address of bss to a1
bgeu a0, a1, done_bss #do not clear if a0>=a1, i.e. bss section is empty
clear_bss:
sw x0, (a0) #store 0 to the address in a0 (bss)
addi a0, a0, 4 #increment by 4
bltu a0, a1, clear_bss #if bss end not reached, continue to store 0
done_bss:
# <original crt.S continues>
Plus, include the variables __sbss
and __ebss
in the test.ld by modifying the bss section:
.sbss : {
__sbss = .; /*starting address of bss*/
*(.sbss .sbss.* .gnu.linkonce.sb.*)
*(.scommon)
}
.bss : {
*(.bss)
__ebss = .; /*ending address of bss*/
}
Now the simulation output of my test program is correct:
seed1_volatile=0x0
seed2_volatile=0x0
seed3_volatile=0x66
seed4_volatile=0xa
seed5_volatile=0
num1=0x0
num2=0x0
num3=0x0
num4=0x0
num5=0x0