Why traps Rocket Chip on FPGA after code execution in DRAM

I try to get a version of Rocket Chip on a Nexys4 DDR board up and running. So far I managed to generate the verilog, synthesize the design and attach a jtag probe (segger jlink). It is possible to access registers and memory via openocd and gdb. After loading a small snippet of asm the core starts executing but jumps after the first executed instruction directly to 0x0000000. I assume it traps and since the trap vector is not initialized the core ends up at 0. Does anybody know how to fix this?

The simulation of the core works both with verilator and vcs. In both cases the core executes the three asm instructions without complains.

The tested asm code is:

.section .text                                                                  
.global _start                                                                  
_start:                                                                         
    add x0,x0,x0                                                                
    add x0,x0,x0                                                                                                                             
    j _start

linked with the this script:

SECTIONS
{
    . = 0x80000000;
    .text : { *(.text) }
}

Object dump:

Disassembly of section .text:

0000000080000000 <_start>:
    80000000:   00000033                add     zero,zero,zero
    80000004:   00000033                add     zero,zero,zero
    80000008:   ff9ff06f                j       80000000 <_start>

Solution

Recently ran into a similar issue with DDR4, GDB and a SiFive RISC chip. After loading code onto the DDR4, and attempting to step from the reset vector, the RISCV would immediately jump to 0x00000000. After debuging with a Xilinx ILA, we found that although we were programming the DDR4 memory space with GDB, the RISCV was caching some of the code internally, and only occasionally pushing some to the DDR4. From the RISCV point of view, this is thought to be okay because, when you step and it will decide to use cache if available, otherwise it would retrieve code from the DDR4. But lets say your CPU pulls several bursts of DDR accesses because it wants a lot of code for efficency. Some of that large chunk of code, which could be empty space if your program is really small, will not have been programmed and thus the ECC is not calculated correctly.

Check the machine cause register after jumping to 0x00000000. See if it indicates 0x2, illegal instruction. In my case, I saw this because the bus observed a "bus error", which was induced by an ECC fault do to a half programmed DDR burst.

One way to work around this might be to pad your ELF with a bunch of extra zeros at the end such that the size will force the cache to flush to the memory. Once the DDR is really programmed, and the ECC's are correct, you shouldn't see the invalid instruction anymore. Let me know if that works for you or not.