c++castingshared-librariesmove-semanticsalloca

Allocating structs of arbitrary constant size on the stack


I've written a small working plugin server. The plugins are implemented using .so shared objects, which are manually loaded during runtime in the "server" by calls to dlopen (header <dlfcn.h>).

All of the shared object plugins have the same interface:

extern "C" void* do_something() {
    return SharedAllocator<T>{}.allocate(...); // new T
}
extern "C" size_t id = ...; // unique

The server is in charge of dynamically loading and reading the symbols of the .so binaries. All .so plugins can call each other through a method do_something_proxy defined in the server binary, which acts as the glue between the callers and the callees:

extern "C" void* do_something_proxy(size_t id) {
    // find the requested handle
    auto handle = some_so_map.find(id)->second;

    // call the handle's `do_something`
    void* something_done = handle.do_something();

    // forward the result
    return something_done;
}

To simplify things a bit, let's say that some_so_map is a plain std::unordered_map<size_t, so_handle_t> filled using a bunch of calls to dlopen when the proxy is executed.

My issue is that every caller of do_something_proxy knows T at compile time. As I said earlier, T can vary from call site to call site; however T never changes for an arbitrary call site.

For reference, here's the definition all callers use:

template <typename T, size_t id>
T* typed_do_soemthing_proxy() {
    // simple cast of the proxy
    return reinterpret_cast<T*>(do_soemthing_proxy(id));
}

In other words, do_something_proxy for some arbitrary plugin id always has the same return type.

If it wasn't for the proxy, I could just template do_soemthing_proxy and pass T or an std::array<int8_t, N> with sizeof(T) == N, and the unnecessary memory allocated to ensure T is not sliced when calling do_something_proxy could be moved to the stack. However, the proxy cannot be aware of all possible return types during compile time and export a zillion versions of do_something_proxy.

So my question is, is there any way for do_soemthing_proxy to allocate the effective size of T on its stack (i.e. using alloca or some other form of stack allocation)?

As far as I can tell, alloca doesn't seem to work here, as do_soemthing_proxy can only receive a single value from the do_something function of the requested plugin. do_soemthing_proxy would receive both the size to allocate, and the bytes to copy to the allocated memory, at the same time. If only alloca could be "squished" in between...

I know I could allocate a fixed amount of memory on the stack using an std::array<int8_t, N> with 256 or even 1024 for values of N. However, this solution is a bit dirty. It unnecessarily copies data from one stackframe to another, and limits the amount of data that a plugin can return. To top it off, (while I haven't benchmarked this solution yet) unless compilers can elide copies across dynamic boundaries, I'd assume copying 1024 bytes is more work than copying i.e. sizeof(std::string) bytes.

In an ideal world, I believe do_soemthing_proxy should return a struct that handles this with RAII. A const std::any that is stack-allocated, if you will. Is this even possible?

If this is not possible at all within c++, would it possible to achieve this behavior in a portable manner in assembly, i.e. by hijacking the stack or base pointers manually?

Thanks.


Solution

  • Actually, I just found a solution. It boils down to inverting the direction in which the memory location for the allocation of T is passed around.

    Is there any way for do_soemthing_proxy to allocate the effective size of T on its stack?

    Maybe. But what the code actually needs is an allocation of the effective size of T at the caller's location, not inside the proxy. And since the caller knows sizeof(T), all you have to do is allocate the space for T on the stack of the caller before calling do_something, and then pass the address of the allocated buffer to do_something_proxy when calling it:

    For the caller:

    template <typename T, size_t id>
    T typed_do_something_proxy() {
        std::aligned_storage_t<sizeof(T), alignof(T)> return_buffer;
        do_something_proxy(id, &return_buffer);
        return *std::launder(reinterpret_cast<T*>(&return_buffer));
    }
    

    For the proxy:

    extern "C" void do_something_proxy(size_t id, void* return_buffer) {
        auto handle = some_so_map.find(id)->second;
        handle.do_something(return_buffer);
    }
    

    For the callee

    extern "C" void do_something(void* return_buffer) {
        new(return_buffer) T(...); // placement new
    }