I'm currently writing x86 assembly (FASM) by hand and one typical mistake I often make is to push
an argument on the stack, but return before the pop
is executed.
This causes the stack offset to change for the caller, which will make the program crash.
This is a rough example to demonstrate it:
proc MyFunction
; A loop:
mov ecx, 100
.loop:
push ecx
; ==== loop content
...
; Somewhere, the decision is made to return, not just to exit the loop
jmp .ret
...
; ==== loop content
pop ecx
loop .loop
.ret:
ret
endp
Now, the obvious answer is to pop the proper number of elements off the stack, before issuing a ret
. However, it's easy to overlook something in 1000+ lines of handcrafted assembly.
I was also thinking about using pushad
/ popad
always, but I'm not sure what the convention is for that.
Question: Is there any pattern that I could follow to avoid this issue?
Normally don't use push
/pop
inside loops; use mov
like a compiler would so you're not moving ESP
around unnecessarily. (That can lead to extra stack-sync uops if/when you reference ESP
explicitly for other locals.)
Or in this case, just pick a different register for your two different loops, or fully keep the outer loop counter in memory after reserving some space. (sub dword [esp], 1
/ jnz .outer_loop
. Or [ebp-4]
if you set up EBP
as a frame pointer instead of just using it as another call-preserved register.)
Spilling/reloading a register around something inside a loop is inefficient. Your first step in freeing up registers should be to keep read-only things in memory, if they're not needed extremely often. e.g. an outer loop counter like inc edx
/ cmp edx, [esp+12]
/ jbe .outer_loop
avoids a store/reload. Only keep mutable things in memory when you run out of registers, and then of course prefer things that aren't changed often.
In compiler-generated code, you'll normally only see pushes in the prologue, and pops along paths that lead to a ret
. That makes it easy to match them up. If you need to save another call-preserved register for use inside the function, or reserve more stack space for locals, you change the sequence of pushes at the top of the function, and then change the epilogue in the return path(s).
(You can have more than one way out of a function, especially if there's not much cleanup needed then tail duplication can be better than a jmp
to the other copy of the epilogue.)
You don't have to be as rigidly disciplined (or braindead) as a compiler, after all, you're writing by hand in asm to get better performance. (right? Otherwise just let a compiler do the micro-optimization for you in generating "thousands of lines" of asm! Medium to large amounts of code are where compilers really shine in their ability to quickly analyze data flow and make pretty decent code.)
So you can for example use the asm stack as a stack data structure; something you can't convince a compiler to do. (Using the callstack to implement a stack data structure in C? is an unsafe attempt though.) Like push
and pop
, with "empty" detection via a pointer compare. In that case you'd want to be using EBP
as a frame pointer, if you have any other need for stack memory.