cvulkandynamic-linkingdynamic-loadinglibdl

dynamically loading a function in a shared library causes a segmentation fault


I have this simple library

lib.h:

int lib()

lib.c:

#include <stdio.h>

#include <dlfcn.h>

#define VK_NO_PROTOTYPES
#include <vulkan/vulkan.h>

PFN_vkGetInstanceProcAddr vkGetInstanceProcAddr;
PFN_vkEnumerateInstanceLayerProperties vkEnumerateInstanceLayerProperties;

int lib()
{
    void *lib = dlopen("libvulkan.so.1", RTLD_NOW);
    vkGetInstanceProcAddr = dlsym(lib, "vkGetInstanceProcAddr");

    vkEnumerateInstanceLayerProperties = (PFN_vkEnumerateInstanceLayerProperties)vkGetInstanceProcAddr(NULL, "vkEnumerateInstanceLayerProperties");
    uint32_t count;
    vkEnumerateInstanceLayerProperties(&count, NULL);
    printf("%d\n", count);

    return 0;
}

I compile it to a shared library using

libabc.so: lib.o
    $(CC) -shared -o $@ $^ -ldl

lib.o: lib.c lib.h
    $(CC) -fPIC -g -Wall -c -o $@ $<

But when I use this library in an application I get a segfault when vkEnumerateInstanceLayerProperties is called on line 18.

What's more, if I change the name vkEnumerateInstanceLayerProperties to something else, say test, then everything works just fine and (in my system) 6 is printed. It also works if I don't use a dynamic library at all, i.e. I compile lib.c together with main.c directly without -fPIC.

What is causing this and how do I resolve it?


Solution

  • The problem is that these two definitions:

    PFN_vkGetInstanceProcAddr vkGetInstanceProcAddr;
    PFN_vkEnumerateInstanceLayerProperties vkEnumerateInstanceLayerProperties;
    

    define global symbols named vkGetInstanceProcAddr and vkEnumerateInstanceLayerProperties in lib.so.

    These definitions override the ones inside libvulkan, and so the vkGetInstanceProcAddr(NULL, "vkEnumerateInstanceLayerProperties"); call returns the definition inside lib.so, instead of the intended one inside libvulcan.so.1. And that symbol is not callable (is in the .bss section), so attempt to call it (naturally) produces a SIGSEGV.

    To fix this, either make these symbols static, or name them differently, e.g. p_vkGetInstanceProcAddr and p_vkEnumerateInstanceLayerProperties.

    Update:

    Why compiling lib.c together with main.c directly (without an intermediate shared library inbetween) works?

    Because symbols are (by default) not exported from an executable in the dynamic symbol table, unless some shared library references them.

    You can change the default by adding -Wl,--export-dynamic (which causes the main executable to export all non-local symbols) to the main executable link line. If you do so, linking lib.c with main.c will also fail.

    Also how can vkGetInstanceProcAddr"capture" thevkEnumerateInstanceLayerProperties` in lib.so?

    By using normal symbol resolution rules -- the first ELF binary to define the symbol wins.

    Shouldn't it just return some kind of predefined address that points to the correct function? I imagine that it is implemented with something like if (!strcmp(...)) return vkGetInstanceProcAddr_internal.

    If it were implemented this way, it would have worked.

    The implementation I can find doesn't do the ..._internal part:

    void *globalGetProcAddr(const char *name) {
        if (!name || name[0] != 'v' || name[1] != 'k') return NULL;
    
        name += 2;
        if (!strcmp(name, "CreateInstance")) return vkCreateInstance;
        if (!strcmp(name, "EnumerateInstanceExtensionProperties")) return vkEnumerateInstanceExtensionProperties;
    ...
    

    Arguably that is an implementation bug -- it should return the address of a local alias (the ..._internal symbol) and be immune to symbol overriding.