cassemblyarmstm32cortex-m

Issue with measuring ARM MCU interrupt latency


Introduction

I am a student who wants to roughly measure the interrupt latency of ARM Cortex-M series chips without using an oscilloscope. However, I have encountered a very peculiar issue that has left me extremely puzzled.

First, let me briefly outline my measurement approach. I attempted to roughly measure the interrupt latency using the built-in timer of the MCU. My basic idea was as follows: before entering the Interrupt Service Routine (ISR), I would reset the timer to zero. Then, I would generate an interrupt request and enter the ISR. Immediately upon entering the ISR, I would read the value of the timer. At that moment, the timer value could serve as a very rough estimate of the interrupt latency.

Due to certain reasons, I need to write the mentioned ISR function using Assembly language. In the main function, I generate a software interrupt using the SWIER register, which allows me to enter the ISR I have written.

I encountered a peculiar issue during this process. When writing an ISR in Assembly language, it is customary to push several registers onto the stack upon entering the function, such as R4 to R11, and pop them back when exiting the function. Therefore, in theory, if I don't push the registers onto the stack upon entering the ISR (assuming I don't use those registers within the ISR), the rough measurement of interrupt latency should be slightly smaller compared to when I do push the registers onto the stack upon entering the ISR. However, through my experiments, I found that the results were exactly the opposite. If I push the registers onto the stack upon entering the ISR, the measured interrupt latency is actually slightly smaller. I conducted experiments using different IDEs (corresponding to different compilers) and also experimented with different products using the same ARM chip (STM32103ZET6 & STM32F103C8T6), but the results remained consistent. This has left me thoroughly puzzled.

Experimental Setup

item config
CPU intel i5 8400
OS Windows 10
IDE Keil uVision 5 V5.39.0.0 & IAR For ARM version 8.32.1.18631
C/C++ C99&C++11, optimization: O0
STM32CUBEMX version 6.10.0
ARM ARM Cortex M3
MCUs STM32F103ZET6 & STM32F103C8T6
Timer Timer2, Prescaler: 0, ARR: 65535, Counter mode: up, auto-reload preload: enable
EXI EXIT0, PA0, Priority Group: 4 bits, Preemption Priority: 1, Sub Priority: 0

Experimental result

In the table below, the CLOCK column represents the frequency of SYSCLK (HCLK). R4-R11 indicates the timer count value obtained from reading the Timer when registers R4 to R11 were pushed onto the stack upon entering the ISR. Similarly, R4 represents the timer count value obtained when only register R4 was pushed onto the stack upon entering the ISR. "Unstacked" indicates the timer count value obtained when no registers were pushed onto the stack upon entering the ISR. The timer count values in the table are obtained from multiple experiments.

From the above experimental results, it can be observed that during the measurement process, if only register R4 is pushed onto the stack, the timer count value obtained is smaller compared to when registers R4 to R11 are pushed onto the stack (which aligns with theory and intuition). However, when we choose not to push the registers onto the stack, something strange happens: the timer count value obtained is larger, which contradicts our intuition.

exp1

MCU】:STM32F103C8T6

IDE】: Keil uVision 5 V5.39.0.0

CLOCK R4~R11 R4 unstacked
72HMZ 40~42 34~36 78~80
40MHZ 38~40 34~36 70~72
20MHZ 38 32 62~64

exp2

MCU】:STM32F103ZET6

IDE】: Keil uVision 5 V5.39.0.0

CLOCK R4~R11 R4 unstacked
72HMZ 40 34 78
40MHZ 38~40 32 68~70
20MHZ 38 30~32 62~64

exp3

MCU】:STM32F103ZET6

IDE】: IAR For ARM version 8.32.1.18631

CLOCK R4~R11 R4 unstacked
72HMZ 40~42 34~36 78~80
40MHZ 38~40 32 66
20MHZ 38~40 32 66

From the above experimental results, it can be observed that during the measurement process, if only register R4 is pushed onto the stack, the timer count value obtained is smaller compared to when registers R4 to R11 are pushed onto the stack (which aligns with theory and intuition). However, when we choose not to push the registers onto the stack, something strange happens: the timer count value obtained is larger, which contradicts our intuition.

Experimental code

Here is some of the core code.

//main.c 
  /* USER CODE BEGIN WHILE */
  while (1)
  {
        TIM2->CNT = 0;      //set timer2 count=0
        EXTI->SWIER |= 1; //call EXIT0 ISR
        for(;i!=0;i--){}    //delay
        printf("[%5d]:TIM time t is :%d\n",line,t);
        printf("***********[%5d]:*******\n",line);
        line++;
        i = COUNT_NUM;
        HAL_GPIO_TogglePin(GPIOC,GPIO_PIN_13);
    /* USER CODE END WHILE */

    /* USER CODE BEGIN 3 */
  }
;ISR.s
    AREA ISR,CODE,READONLY
    EXPORT EXTI0_IRQHandler
    ;EXTI0_IRQHandler is a global function name that is an ISR written in the Assembly language.
    ;To enable EXTI0_IRQHandler, the automatically generated EXTI0_IRQHandler function needs to be commented out in the stm32f1xx_it.c file.
    IMPORT t
    ;t is a global variable used to store the value of the timer read in the ISR.
    IMPORT exit_pr_addr
    ;exit_pr_addr is the global variable used to store the address of the EXTI_PR register. 
    ;This address is used in the ISR in order to clear the interrupt request flag
    IMPORT tim_cnt_addr
    ;tim_cnt_addr is a global variable used to store the address of the count register in the timer.
    ;This address is used in the ISR in order to read the count value of the timer

EXTI0_IRQHandler
    ;push {r4-r11}
    ;push {r4}
    ;;;t = TIM2->CNT
    ldr r1,=tim_cnt_addr;read global var addr
    ldr r2,[r1]         ;read tim_cnt_addr value
    ldr r2,[r2]         ;read cnt value
    ldr r0,=t           ;get global var t addr
    str r2,[r0]         ;set t = TIM2->CNT
    ;;;clear interupt request flg
    ldr r1,=exit_pr_addr ;read global var addr
    ldr r1,[r1]     ;read exit_pr_addr value
    mov r3,#(1<<0)  ;
    str r3,[r1];
    ;pop {r4}
    ;pop {r4-r11}
    bx lr
    END

Some settings for compiling with Keil uVision 5 V5.39.0.0.

pic3

pic4

Attachment

I have packaged the above mentioned project code on STM32F103C8T6, which compiles successfully under Keil uVision 5 V5.39.0.0. Under the IAR platform, the project code is almost the same, except for some differences in the format of the ISR written in assembly language, the rest of the code remains the same. This part of the code is given in the ISR_FOR_IAR.s file in my attachment. The code mentioned above is uploaded to google drive as attachment.

Is there anyone who can explain this strange problem I've discovered?


Solution

  • In the case without push/pop, you are probably running the ISR twice, due to a late clear of the flag which caused the interrupt. As the two ISRs run back to back, main prints the value stored in the second run (thus roughly the whole duration of ISR, plus "half" of the second one, minus correction for tail-chaining).

    You can confirm or reject this by moving the pop before clearing the interrupt flag, or by adding a counter counting the number, or by checking the interrupt flag upon entry to the ISR, or by clearing TIMx_CNT in the ISR, or by setting TIMx_CNT to some conspicuous value in the ISR, or... there are probably many ways.

    P.S. An oscilloscope, or at least a cheap logic analyzer (LA) is an indispensable tool for microcontroller programming; I recommend you get some. one of the methods to check interrupts latency/duration is to toggle a pin and measure - you've mentioned that yourself.