I've written a small working plugin server. The plugins are implemented using .so
shared objects, which are manually loaded during runtime in the "server" by calls to dlopen
(header <dlfcn.h>
).
All of the shared object plugins have the same interface:
extern "C" void* do_something() {
return SharedAllocator<T>{}.allocate(...); // new T
}
extern "C" size_t id = ...; // unique
do_something
returns a pointer to heap memory, that the caller is expected to free.id
is simply an identifier unique per .so
.T
is a struct specific to each .so
. Some of them share the same return type, some of them don't. The point here is, sizeof(T)
is .so
specific.The server is in charge of dynamically loading and reading the symbols of the .so
binaries. All .so
plugins can call each other through a method do_something_proxy
defined in the server binary, which acts as the glue between the callers and the callees:
extern "C" void* do_something_proxy(size_t id) {
// find the requested handle
auto handle = some_so_map.find(id)->second;
// call the handle's `do_something`
void* something_done = handle.do_something();
// forward the result
return something_done;
}
To simplify things a bit, let's say that some_so_map
is a plain std::unordered_map<size_t, so_handle_t>
filled using a bunch of calls to dlopen
when the proxy is executed.
My issue is that every caller of do_something_proxy
knows T
at compile time. As I said earlier, T
can vary from call site to call site; however T
never changes for an arbitrary call site.
For reference, here's the definition all callers use:
template <typename T, size_t id>
T* typed_do_soemthing_proxy() {
// simple cast of the proxy
return reinterpret_cast<T*>(do_soemthing_proxy(id));
}
In other words, do_something_proxy
for some arbitrary plugin id
always has the same return type.
If it wasn't for the proxy, I could just template do_soemthing_proxy
and pass T
or an std::array<int8_t, N>
with sizeof(T) == N
, and the unnecessary memory allocated to ensure T
is not sliced when calling do_something_proxy
could be moved to the stack. However, the proxy cannot be aware of all possible return types during compile time and export a zillion versions of do_something_proxy
.
So my question is, is there any way for do_soemthing_proxy
to allocate the effective size of T
on its stack (i.e. using alloca
or some other form of stack allocation)?
As far as I can tell, alloca
doesn't seem to work here, as do_soemthing_proxy
can only receive a single value from the do_something
function of the requested plugin. do_soemthing_proxy
would receive both the size to allocate, and the bytes to copy to the allocated memory, at the same time. If only alloca
could be "squished" in between...
I know I could allocate a fixed amount of memory on the stack using an std::array<int8_t, N>
with 256 or even 1024 for values of N
. However, this solution is a bit dirty. It unnecessarily copies data from one stackframe to another, and limits the amount of data that a plugin can return. To top it off, (while I haven't benchmarked this solution yet) unless compilers can elide copies across dynamic boundaries, I'd assume copying 1024 bytes is more work than copying i.e. sizeof(std::string)
bytes.
In an ideal world, I believe do_soemthing_proxy
should return a struct that handles this with RAII. A const std::any
that is stack-allocated, if you will. Is this even possible?
If this is not possible at all within c++, would it possible to achieve this behavior in a portable manner in assembly, i.e. by hijacking the stack or base pointers manually?
Thanks.
Actually, I just found a solution. It boils down to inverting the direction in which the memory location for the allocation of T
is passed around.
Is there any way for
do_soemthing_proxy
to allocate the effective size ofT
on its stack?
Maybe. But what the code actually needs is an allocation of the effective size of T
at the caller's location, not inside the proxy. And since the caller knows sizeof(T)
, all you have to do is allocate the space for T
on the stack of the caller before calling do_something
, and then pass the address of the allocated buffer to do_something_proxy
when calling it:
For the caller:
template <typename T, size_t id>
T typed_do_something_proxy() {
std::aligned_storage_t<sizeof(T), alignof(T)> return_buffer;
do_something_proxy(id, &return_buffer);
return *std::launder(reinterpret_cast<T*>(&return_buffer));
}
For the proxy:
extern "C" void do_something_proxy(size_t id, void* return_buffer) {
auto handle = some_so_map.find(id)->second;
handle.do_something(return_buffer);
}
For the callee
extern "C" void do_something(void* return_buffer) {
new(return_buffer) T(...); // placement new
}