debuggingarmembeddedcortex-mhardfault

How to perform a MCU reset after a specific hardfault?


As I didn't find an existing questions on stackoverflow or on google, please find the context below. I'm investigating an issue leading to 2 different hardfaults on ARM Cortex-M33.

The first one appears almost every time. The second happens very rarely (only happened a few times during 2 days of intensive testing).

The two hardfaults have different signatures, by which I mean the 1st one is 90% of the time a precise data access error with BFAR valid (so CFSR PRECISERR and CFSR BFARVALID are set to 1, with CFSR value being 0x00008200 consequently).

However, because this 1st hardfault is omnipresent, it is difficult for me to reproduce the 2nd one which I would like to investigate.

Question : Is there any way to reset by SW the MCU once a hardfault has occured ?

I tried to sort the hardfault signatures based on the CFSR value as shown in the code below, but I was unable to reset the system because the MCU was obviously no longer able to execute instructions.

void HardFault_Handler(void)
{
  /* First hardfault */
  if (SCB->CFSR == 0x00008200)
  {
    APP_DEBUG_SIGNAL_SET(APP_HARD_FAULT);
    
    while(1) {} // Comment this line if NVIC_SystemReset works or there is a way to reset the MCU
//    NVIC_SystemReset(); // Does not work because MCU is in space and cannot execute this anymore
    
  }
  /* The hardfault I'd like to catch */
  else
  {  
    APP_DEBUG_SIGNAL_SET(APP_HARD_FAULT);

    while (1)
    {
    }
  }
}

Thank you for your advices


Solution

  • The normal way to force a MCU reset is to write an invalid sequence to the watchdog register.

    For example if you normally refresh the watchdog with WDT.CLEAR = magic_number; then simply write an invalid number instead: WDT.CLEAR = 0;.

    As for how to resolve the hard faults, the best way by far is to set a breakpoint in the hardfault handler and when it hits, view the instruction trace in your quality debugger to see which code that was executed just before you landed there.

    Note however that some hard faults are caused by things like low-voltage detect or clock monitor, in which case software-anything is screwed and the MCU hardware ought to reset itself. Low-voltage detect peripherals may have a "warning interrupt" though, when the voltage starts slipping but haven't gone haywire just yet.