As far as I understand it, the stack pointer points to the "free" memory on the stack, and "pushing" data on the stack writes to the location pointed by the stack pointer and increments/decrements it.
But isn't it possible to use offsets from the frame pointer to achieve the same thing, thus saving a register. The overhead from adding offsets to the frame pointer is pretty much the same as the overhead of incrementing and decrementing the stack pointer. The only advantage I see is accessing data from the "top" (or bottom) will be faster, as long as it is not a push or pop operation, e.g. just reading or writing to that address without incrementing/decrementing. But then again, such operations would take a single extra cycle using the frame pointer, and there will be one additional register for general purpose use.
It seems like only the frame pointer is really needed. And it even serves a lot more purpose than just modifying data in the current stack frame, such as to be used in debugging and for stack unwinding. Am I missing something?
Well, yes, and in fact common for 64-bit code generators. There are complications however that do not make it universally possible. A hard requirement is that the value of the stack pointer is known at compile time so the code generator can generate the offset reliably. This does not work when:
the language runtime provides non-trivial alignment guarantees. Particularly a problem in 32-bit code when the stack frame contains 8-byte variables, like double. Accessing a mis-aligned variable is very expensive (x2 if misaligned by 4, x3 if it straddles an L1 cache-line) and might invalidate a memory model guarantee. The code generator cannot normally assume that the function is entered with an aligned stack so needs to generate code in the function prologue, this can cause the stack pointer to decrement by an extra 4 bytes.
the language runtime provides a way for a program to dynamically allocate stack space. Very common and desirable, it is very cheap and fast memory. Examples are alloca() in the CRT, variable length arrays in C99+, the stackalloc keyword in the C# language.
the language runtime needs to provide a reliable way to walk the stack. Common in exception handling, implementation of a sandbox that need to be able to discover the caller's rights, garbage collected languages that need to be able to discover pointers to objects. Many possible ways to do this of course, but using the base pointer and storing the caller's base pointer in a known location in the stack frame makes it simple.