c++x86-64allocastack-allocation

How can I emulate a stack frame in C++?


I am writing a container that uses alloca internally to allocate data on the stack. Risks of using alloca aside, assume that I must use it for the domain I am in (it's partly a learning exercise around alloca and partly to investigate possible implementations of dynamically-sized stack-allocated containers).

According to the man page for alloca (emphasis mine) :

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

Using implementation-specific features, I have managed to force inlining in such a way that the callers stack is used for this function-level "scoping".

However, that means that the following code will allocate a huge amount of memory on the stack (compiler optimisations aside):

for(auto iteration : range(0, 10000)) {
    // the ctor parameter is the number of
    // instances of T to allocate on the stack,
    // it's not normally known at compile-time
    my_container<T> instance(32);
}

Without knowing the implementation details of this container, one might expect any memory it allocates to be free'd when instance goes out of scope. This is not the case and can result in a stack overflow / high memory usage for the duration of the enclosing function.

One approach that came to mind was to explicitly free the memory in the destructor. Short of reverse engineering the resulting assembly, I haven't found a way of doing that yet (also see this).

The only other approach I have thought of is to have a maximum size specified at compile-time, use that to allocate a fixed-size buffer, have the real size specified at runtime and use the fixed-size buffer internally. The issue with this is that it's potentially very wasteful (suppose your maximum were 256 bytes per container, but you only needed 32 most of the time).

Hence this question; I want to find a way to provide these scope semantics to the users of this container. Non-portable is fine, so long as it's reliable on the platform its targeting (for example, some documented compiler extension that only works for x86_64 is fine).

I appreciate this could be an XY problem, so let me restate my goals clearly:


Solution

  • I am writing a container that must always allocate its memory on the stack (to the best of my knowledge, this rules out C VLAs).

    The normal implementation of C VLAs in most compilers is on the stack. Of course ISO C++ doesn't say anything about how automatic storage is implemented under the hood, but it's (nearly?) universal for C implementations on normal machines (that do have a call+data stack) to use that for all automatic storage including VLAs.

    If your VLA is too large, you get a stack overflow rather than a fallback to malloc / free.

    Neither C nor C++ specify alloca; it's only available on implementations that have a stack like "normal" machines, i.e. the same machines where you can expect VLAs to do what you want.

    All of these conditions hold for all the major compilers on x86-64 (except that MSVC doesn't support VLAs).


    If you have a C++ compiler that supports C99 VLAs (like GNU C++), smart compilers may reuse the same stack memory for a VLA with loop scope.


    have a maximum size specified at compile-time, use that to allocate a fixed-size buffer ... wasteful

    For a special case like you mention, you could maybe have a fixed-size buffer as part of the object (size as a template param), and use that if it's big enough. If not, dynamically allocate. Maybe use a pointer member to point to either the internal or external buffer, and a flag to remember whether to delete it or not in the destructor. (You need to avoid delete on an array that's part of the object, of course.)

    // optionally static_assert (! (internalsize & (internalsize-1), "internalsize not a power of 2")
    // if you do anything that's easier with a power of 2 size
    template <type T, size_t internalsize>
    class my_container {
        T *data;
        T internaldata[internalsize];
        unsigned used_size;
        int allocated_size;   // intended for small containers: use int instead of size_t
        // bool needs_delete;     // negative allocated size means internal
    }
    

    The allocated_size only needs to be checked when it grows, so I made it signed int so we can overload it instead of needing an extra boolean member.

    Normally a container uses 3 pointers instead of pointer + 2 integers, but if you don't grow/shrink often then we save space (on x86-64 where int is 32 bits and pointers are 64-bit), and allow this overloading.

    A container that grows large enough to need dynamic allocation should continue using that space but then shrinks should keep using the dynamic space, so it's cheaper to grow again, and to avoid copying back into the internal storage. Unless the caller uses a function to release unused excess storage, then copy back.

    A move constructor should probably keep allocation as-is, but a copy constructor should copy into the internal buffer if possible instead of allocating new dynamic storage.