How to estimate usage of library functions

I'm trying to calculate maximum stack usage of an embedded program using static analysis.

I've used the compiler flag -fstack-usage to get the maximum stack usage for each function and the flag -fdump-rtl-expand to generate a graph of all function calls.

The last missing ingredient is stack usage of built-in functions. (at the moment it's only memset)

I guess I could measure it some other way and put a constant into my script. However, I don't want a situation where the implementation of the built-in function changes in a new version of GCC and the value in my script stays the same.

Maybe there is some way to compile built-in functions with the flag -fstack-usage? Or some other way to measure their stack usage via static analysis?

Edit:

This question is not a duplicate of Stack Size Estimation. The other question is about estimating stack usage of an entire program while I asked how to estimate it for a single ~~built-in~~ library function. The other question doesn't even mention ~~built-in~~ library functions nor any of the answers for it does.

Solution

Approach 1 (dynamic analysis)

You could determine stack size at runtime by filling stack with a predefined pattern, executing memset and then checking how many bytes have been modified. This is slower and more involved as you need to compile a sample program, upload it to target (unless you have a simulator) and collect results. You'll also need to be careful about test data that you supply to the function as execution path may change depending on size, data alignment, etc.

For a real-world example of this approach check Abseil's code.

Approach 2 (static analysis)

In general static analysis of binary code is tricky (even disassembling it isn't trivial) and you'd need sophisticated symbolic execution machinery to deal with it (e.g. miasm). But in most cases you can safely rely on detecting patterns which your compiler uses to allocate frames. E.g. for x86_64 GCC you could do something like:

objdump -d /lib64/libc.so.6 | sed -ne '/<__memset_x86_64>:/,/^$/p' > memset.d
NUM_PUSHES=$(grep -c pushq memset.d)
LOCALS=$(sed -ne '/sub .*%rsp/{ s/.*sub \+\$\([^,]\+\),%rsp.*/\1/; p }' memset.d)
LOCALS=$(printf '%d' $LOCALS)  # Unhex
echo $(( LOCALS + 8 * NUM_PUSHES ))

Note that this simple approach produces a conservative estimate (getting more precise result is doable but would require a path-sensitive analysis which requires proper parsing, building control-flow graph, etc.) and does not handle nested function calls (can be easily added but should probly be done in a language more expressive than shell).

AVR assembly is in general more complicated because you can't easily detect allocation of space for local variables (modification of stack pointer is split across several in, out and adiw instructions so would require non-trivial parsing in e.g. Python). Simple functions like memset or memcpy don't use local variables so you can still get away with simple greps:

NUM_PUSHES=$(grep -c 'push ' memset.d)
NUM_RCALLS=$(grep -c 'rcall \+\.+0' memset.d)
# A safety check for functions which we can't handle
if grep -qi 'out \+0x3[de]' memset.d; then
  echo >&2 'Unable to parse stack modification'
  exit 1
fi
echo $((NUM_PUSHES + 2 * NUM_RCALLS))