I have a simple C program:
int main(){
unsigned int counter = 0;
++counter;
++counter;
++counter;
return 0;
}
I am using the following compile flags:
arm-none-eabi-gcc -c -mcpu=cortex-m4 -march=armv7e-m -mthumb
-mfloat-abi=hard -mfpu=fpv4-sp-d16 -DPART_TM4C123GH6PM -O0
-ffunction-sections -fdata-sections -g -gdwarf-3 -gstrict-dwarf
-Wall -MD -std=c99 -c -MMD -MP -MF"main.d" -MT"main.o" -o"main.o" "../main.c"
(some -I directives removed for brevity)
Note that I'm deliberately using -O0
to disable optimisations because I'm interested in learning what the compiler does to optimise.
This compiles into the following assembly for ARM Cortex-M4:
6 unsigned int counter = 0;
00000396: 2300 movs r3, #0
00000398: 607B str r3, [r7, #4]
7 ++counter;
0000039a: 687B ldr r3, [r7, #4]
0000039c: 3301 adds r3, #1
0000039e: 607B str r3, [r7, #4]
8 ++counter;
000003a0: 687B ldr r3, [r7, #4]
000003a2: 3301 adds r3, #1
000003a4: 607B str r3, [r7, #4]
9 ++counter;
000003a6: 687B ldr r3, [r7, #4]
000003a8: 3301 adds r3, #1
000003aa: 607B str r3, [r7, #4]
Why are there so many ldr r3, [r7, #4]
and str r3, [r7, #4]
instructions generated? And why does r7
even need to be involved, can't we just use r3
?
Without optimisation (which this clearly is), all the compiler is obliged to do is emit instructions which result in the behaviour defined by the higher level language. It is free to naïvely treat every statement entirely in isolation, and that's exactly what it's doing here; from the compiler's viewpoint:
r7
is being used as a frame pointer here).counter = 0;
- OK, I remember that the storage for counter
is in the local stack frame, so I just pick a scratch register, generate the value 0 and store it to in that location, job done.++counter;
- Right then, I remember that the storage for counter
is in the local stack frame, so I pick a scratch register, load that with the value of the variable, increment it, then update the value of the variable by storing the result back. The return value is unused, so forget about it. Job done.++counter;
- Right then, I remember that the storage for counter
is in the local stack frame, so I pick a scratch register, load that with the value of the variable, increment it, then update the value of the variable by storing the result back. The return value is unused, so forget about it. Job done. As I am a piece of software I cannot even comprehend the human concept of Déjà vu, much less experience it.++counter;
- Right then...And so on. Every statement, perfectly compiled into machine instructions that do precisely the right thing. Exactly what you asked me to do. If you wanted me to reason about the code at a higher level and work out if I can take advantage of the relationships between those statements, you should have said something...