I am a student who wants to roughly measure the interrupt latency of ARM Cortex-M series chips without using an oscilloscope. However, I have encountered a very peculiar issue that has left me extremely puzzled.
First, let me briefly outline my measurement approach. I attempted to roughly measure the interrupt latency using the built-in timer of the MCU. My basic idea was as follows: before entering the Interrupt Service Routine (ISR), I would reset the timer to zero. Then, I would generate an interrupt request and enter the ISR. Immediately upon entering the ISR, I would read the value of the timer. At that moment, the timer value could serve as a very rough estimate of the interrupt latency.
Due to certain reasons, I need to write the mentioned ISR function using Assembly language. In the main function, I generate a software interrupt using the SWIER register, which allows me to enter the ISR I have written.
I encountered a peculiar issue during this process. When writing an ISR in Assembly language, it is customary to push several registers onto the stack upon entering the function, such as R4 to R11, and pop them back when exiting the function. Therefore, in theory, if I don't push the registers onto the stack upon entering the ISR (assuming I don't use those registers within the ISR), the rough measurement of interrupt latency should be slightly smaller compared to when I do push the registers onto the stack upon entering the ISR. However, through my experiments, I found that the results were exactly the opposite. If I push the registers onto the stack upon entering the ISR, the measured interrupt latency is actually slightly smaller. I conducted experiments using different IDEs (corresponding to different compilers) and also experimented with different products using the same ARM chip (STM32103ZET6 & STM32F103C8T6), but the results remained consistent. This has left me thoroughly puzzled.
item | config |
---|---|
CPU | intel i5 8400 |
OS | Windows 10 |
IDE | Keil uVision 5 V5.39.0.0 & IAR For ARM version 8.32.1.18631 |
C/C++ | C99&C++11, optimization: O0 |
STM32CUBEMX | version 6.10.0 |
ARM | ARM Cortex M3 |
MCUs | STM32F103ZET6 & STM32F103C8T6 |
Timer | Timer2, Prescaler: 0, ARR: 65535, Counter mode: up, auto-reload preload: enable |
EXI | EXIT0, PA0, Priority Group: 4 bits, Preemption Priority: 1, Sub Priority: 0 |
In the table below, the CLOCK column represents the frequency of SYSCLK (HCLK). R4-R11 indicates the timer count value obtained from reading the Timer when registers R4 to R11 were pushed onto the stack upon entering the ISR. Similarly, R4 represents the timer count value obtained when only register R4 was pushed onto the stack upon entering the ISR. "Unstacked" indicates the timer count value obtained when no registers were pushed onto the stack upon entering the ISR. The timer count values in the table are obtained from multiple experiments.
From the above experimental results, it can be observed that during the measurement process, if only register R4 is pushed onto the stack, the timer count value obtained is smaller compared to when registers R4 to R11 are pushed onto the stack (which aligns with theory and intuition). However, when we choose not to push the registers onto the stack, something strange happens: the timer count value obtained is larger, which contradicts our intuition.
【MCU】:STM32F103C8T6
【IDE】: Keil uVision 5 V5.39.0.0
CLOCK | R4~R11 | R4 | unstacked |
---|---|---|---|
72HMZ | 40~42 | 34~36 | 78~80 |
40MHZ | 38~40 | 34~36 | 70~72 |
20MHZ | 38 | 32 | 62~64 |
【MCU】:STM32F103ZET6
【IDE】: Keil uVision 5 V5.39.0.0
CLOCK | R4~R11 | R4 | unstacked |
---|---|---|---|
72HMZ | 40 | 34 | 78 |
40MHZ | 38~40 | 32 | 68~70 |
20MHZ | 38 | 30~32 | 62~64 |
【MCU】:STM32F103ZET6
【IDE】: IAR For ARM version 8.32.1.18631
CLOCK | R4~R11 | R4 | unstacked |
---|---|---|---|
72HMZ | 40~42 | 34~36 | 78~80 |
40MHZ | 38~40 | 32 | 66 |
20MHZ | 38~40 | 32 | 66 |
From the above experimental results, it can be observed that during the measurement process, if only register R4 is pushed onto the stack, the timer count value obtained is smaller compared to when registers R4 to R11 are pushed onto the stack (which aligns with theory and intuition). However, when we choose not to push the registers onto the stack, something strange happens: the timer count value obtained is larger, which contradicts our intuition.
Here is some of the core code.
//main.c
/* USER CODE BEGIN WHILE */
while (1)
{
TIM2->CNT = 0; //set timer2 count=0
EXTI->SWIER |= 1; //call EXIT0 ISR
for(;i!=0;i--){} //delay
printf("[%5d]:TIM time t is :%d\n",line,t);
printf("***********[%5d]:*******\n",line);
line++;
i = COUNT_NUM;
HAL_GPIO_TogglePin(GPIOC,GPIO_PIN_13);
/* USER CODE END WHILE */
/* USER CODE BEGIN 3 */
}
;ISR.s
AREA ISR,CODE,READONLY
EXPORT EXTI0_IRQHandler
;EXTI0_IRQHandler is a global function name that is an ISR written in the Assembly language.
;To enable EXTI0_IRQHandler, the automatically generated EXTI0_IRQHandler function needs to be commented out in the stm32f1xx_it.c file.
IMPORT t
;t is a global variable used to store the value of the timer read in the ISR.
IMPORT exit_pr_addr
;exit_pr_addr is the global variable used to store the address of the EXTI_PR register.
;This address is used in the ISR in order to clear the interrupt request flag
IMPORT tim_cnt_addr
;tim_cnt_addr is a global variable used to store the address of the count register in the timer.
;This address is used in the ISR in order to read the count value of the timer
EXTI0_IRQHandler
;push {r4-r11}
;push {r4}
;;;t = TIM2->CNT
ldr r1,=tim_cnt_addr;read global var addr
ldr r2,[r1] ;read tim_cnt_addr value
ldr r2,[r2] ;read cnt value
ldr r0,=t ;get global var t addr
str r2,[r0] ;set t = TIM2->CNT
;;;clear interupt request flg
ldr r1,=exit_pr_addr ;read global var addr
ldr r1,[r1] ;read exit_pr_addr value
mov r3,#(1<<0) ;
str r3,[r1];
;pop {r4}
;pop {r4-r11}
bx lr
END
Some settings for compiling with Keil uVision 5 V5.39.0.0.
I have packaged the above mentioned project code on STM32F103C8T6, which compiles successfully under Keil uVision 5 V5.39.0.0. Under the IAR platform, the project code is almost the same, except for some differences in the format of the ISR written in assembly language, the rest of the code remains the same. This part of the code is given in the ISR_FOR_IAR.s file in my attachment. The code mentioned above is uploaded to google drive as attachment.
Is there anyone who can explain this strange problem I've discovered?
In the case without push/pop, you are probably running the ISR twice, due to a late clear of the flag which caused the interrupt. As the two ISRs run back to back, main prints the value stored in the second run (thus roughly the whole duration of ISR, plus "half" of the second one, minus correction for tail-chaining).
You can confirm or reject this by moving the pop before clearing the interrupt flag, or by adding a counter counting the number, or by checking the interrupt flag upon entry to the ISR, or by clearing TIMx_CNT in the ISR, or by setting TIMx_CNT to some conspicuous value in the ISR, or... there are probably many ways.
P.S. An oscilloscope, or at least a cheap logic analyzer (LA) is an indispensable tool for microcontroller programming; I recommend you get some. one of the methods to check interrupts latency/duration is to toggle a pin and measure - you've mentioned that yourself.