c++linkerbuck

Why is link_whole not adding all symbols on linux


I have a Test library Header File ObjectA.h

#pragma once

namespace sarora::testing {
void testingObject();
void againTesting();

} // namespace sarora::testing

Cpp file ObjectA.cpp

#include "ObjectA.h"
#include <iostream>
namespace sarora::testing {

void testingObject() {
  std::cout << "testingObject" << std::endl;
}

void againTesting() {
  std::cout << "againTesting" << std::endl;
}

} // namespace sarora::testing

Now, I have the buck for this defined as

cpp_library(
    name = "object",
    srcs = ["ObjectA.cpp"],
    headers = ["ObjectA.h"],
    link_whole = True,
)

Once I am done with the cpp library, I add it to the main.cpp

#include "ObjectA.h"
#include <iostream>
using namespace std;
int main(int argc, char* argv[]) {
sarora::testing::testingObject();
}

This is the buck for the final main

cpp_binary(
    name = "test",
    srcs = ["test.cpp"],
    deps = [
        ":object",
    ],
)

Now, note that I only used the testingObject in the main.cpp. When I try to check the symbol table, I do "nm main_executable_path | grep testingObject" and I get the symbol

But when I do grep againTesting, I don't see the symbol, so what is the function of link_whole defined in buck here https://buck.build/rule/cxx_library.html#link_whole


Solution

  • What you are expecting to achieve here is:

    Why you can't do that with a shared library

    This is impossible by the nature of a shared library, as distinct from a static library. (I'll stick with the usual unix-style naming conventions for libraries - libfoo.so is the shared library build of the foo library; libfoo.a is the static library build - although the buck build system has slightly different ones.)

    When you link an executable against a shared library libfoo.so, no part of libfoo.so is statically transcribed into your program. If your program contains no undefined references to symbols defined by libfoo.so then by default absolutely nothing about libfoo.so is written into your program. It might as well not exist. If your program does make an undefined reference to any symbol sym defined by libfoo.so - and libfoo.so is the first library the static linker finds that defines sym - then the static linker merely:

    By default that is all that happens. sym remains an undefined symbol in the executable. It is left to the runtime linker to notice that note when the executable is loaded to run the program, search for libfoo.so, load it into the program's address space and resolve any undefined dynamic symbols in the program that are defined by libfoo.so, or by any other shared libraries that the program needs.

    You can override the default behaviour by passing the option --no-as-needed to the static linker. But that will merely make it note "This program needs libfoo.so" in the executable even if it is not true, i.e. if the program does not actually make undefined references to symbols defined by libfoo.so. That's all. A shared library is a library whose linkage with a program leaves dynamic symbol resolution entirely to the runtime linker. You don't even have to get the static linker to write notes in an executable to tell the runtime linker what shared libraries it needs. The program itself can call the runtime linker to find and load libfoo.so and give it the addresses of symbols defined therein. That's doing it the first-principles way.

    Why you can do that with a static library.

    On the other hand when you link an executable against a static library libfoo.a, what goes on is completely different, as described by the Stackoverflow tag-wiki for static-libraries. libfoo.a is a bag of object files from which the static linker will select just the ones it needs to resolve symbols referenced but not already defined in the executable, take them out of the bag and statically link them into the executable like any other object files in the linkage. Nothing but an object file can be statically linked into an executable. That means that nothing but the linkage of an object file can make the linker physically incorporate symbol definitions into an executable.

    Sometimes you may want the static linker to take all of the object files out of the bag and link them in the executable, whether they are needed or not. To do that you use the linker options:

    --whole-archive libfoo.a ... --no-whole-archive
    

    if you're invoking the static linker directly. Or:

    -Wl,--whole-archive libfoo.a... -Wl,--no-whole-archive
    

    if you're invoking it via GCC/Clang, as usual. (Vital to turn off --whole-archive after all the libraries you want it to apply to, because it will continue to apply to subsequent libraries until you do so.)

    libfoo.a can just be replaced with the usual linkage option -lfoo if static linkage is in effect when you do this: linker option -Bstatic, activated by GCC/Clang linkage option -static. While -Bstatic is in effect the linker will not resolve -lfoo to a shared library libfoo.so - which it does by default - and will only accept the static library libfoo.a, if it can find it. (-Bstatic also continues to be in effect until and unless the default behaviour is restored with -Bdynamic).

    Why your BUCK file doesn't build what you expect.

    The link_whole = True option in your:

    cpp_library(
        name = "object",
        srcs = ["ObjectA.cpp"],
        headers = ["ObjectA.h"],
        link_whole = True,
    )
    

    should mean that:

    -Wl,--whole-archive <object-library-name> Wl,--no-whole-archive
    

    gets written in the toolchain's linkage commandline for the program test as built by your:

    cpp_binary(
        name = "test",
        srcs = ["test.cpp"],
        deps = [
            ":object",
        ],
    )
    

    That's what would happen if <object-library-name> was a static library. But your <object-library-name> = libobject.so, a shared library. And that's because:

    and:

    So by default, buck builds libobject.so and links test against it. It knows that --whole-archive means nothing as applied to libobject.so so it ignores link_whole = True: no error, no warning. Even if:

    -Wl,--whole-archive libobject.so Wl,--no-whole-archive
    

    was passed to the linker it would just ignore -[no]-whole-archive; no error, no warning.

    Buck completes the build of test successfully. test has a dynamic dependency on libobject.so, represented by:

    That's all it gets from libobject.so

    What you'd need to do to your BUCK file to see what you expected

    Before going here, remember that if you link a program against a shared library in order to resolve symbol sym, then you don't want and don't need to have a definition of sym in your program, and won't get one. If there was a definition of sym in your linked program, it could only have got there from an object file that defined sym before any shared library that defined it was reached, and any such shared library definition would have been ignored, because sym was already defined.

    To see the outcome you expect for link_whole = True, you'd need to do one of:

    or:

    Either way (or both together), the object library will be built as the static library libobject.a, and then --whole-archive, and will be meaningful, and buck will apply it. The one and only object file libobject.a(object.o) will be extracted from libobject.a and statically linked into test, bringing with it all the symbol definitions in object.o, and you will see them in the global symbol table of test. (But not in its dynamic symbol table, because they don't need runtime resolution any more.)

    Since there will only be one object file in libobject.a, --whole-archive is of course redundant in this particular case: the linkage will need libobject(object.o) to resolve sarora::testing::testingObject(), so it will extract and link that object file without coercion, and that object file will bring with it all the symbols it defines or references, including those that test does not need. When the linker consumes an object file, it consumes all of it.1.

    For the same reason libobject.a itself is redundant in this particular case. You might as well just compile the object file object.o from ObjectA.cpp and link it directly.

    Bottom line: link_whole is meaningful if and only if you make sure the library you are applying it to is a static libary. link_whole is useful if and only if you want to link all the object files in the static library, whether or not the linker needs them.

    No need to read on unless you're interested in seeing all this demonstrated.

    Demo all that with buck

    Source files:

    $ cat foo.cpp 
    #include <iostream>
    
    void hello_world()
    {
        std::cout << "Hello World" << std::endl;
    }
    
    void goodbye_world()
    {
        std::cout << "Goodbye World" << std::endl;
    }
    
    $ cat main.cpp 
    #include <iostream>
    
    extern void hello_world();
    
    int main() {
        hello_world();
        return 0;
    }
    

    BUCK file, v1:

    $ cat BUCK
    cxx_library(
        name = "foo",
        srcs = ["foo.cpp"],
        link_whole = True,
    )
    
    cxx_binary(
        name = "main",
        srcs = ["main.cpp"],
        deps = [
        ':foo',
        ],
    )
    
    # toolchains/BUCK
    load("@prelude//toolchains:cxx.bzl", "system_cxx_toolchain")
    load("@prelude//toolchains:python.bzl", "system_python_bootstrap_toolchain")
    
    system_cxx_toolchain(
        name = "cxx",
        visibility = ["PUBLIC"],
    )
    
    system_python_bootstrap_toolchain(
        name = "python_bootstrap",
        visibility = ["PUBLIC"],
    )
    

    Build, take #1:

    $ buck2 build //...
    Starting new buck2 daemon...
    Connected to new buck2 daemon.
    Build ID: b0ed2f4f-3d43-47cc-b9e4-19a53158dc3e
    Jobs completed: 62. Time elapsed: 0.3s.
    Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
    BUILD SUCCEEDED
    

    Run the program:

    $ ./buck-out/v2/gen/root/904931f735703749/__main__/main
    Hello World
    

    All good. Now look at its global symbol table and dynamic symbol table (demangled) for hits on hello_world():

    $ readelf -W --syms ./buck-out/v2/gen/root/904931f735703749/__main__/main | \
        c++filt | egrep '(Symbol table|Ndx|hello_world)'
    Symbol table '.dynsym' contains 8 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         7: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND hello_world()
    Symbol table '.symtab' contains 32 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
        31: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND hello_world()
    

    hello_world() is an undefined (Ndx = UND) symbol mentioned once in the dynamic symbol table (.dynsym) and once in the global symbol table (.symtab).

    The runtime linker (ld.so) was able to define hello_world() and run the program because the static linker wrote the following dynamic section in the executable:

    $ readelf --dynamic ./buck-out/v2/gen/root/904931f735703749/__main__/main
    
    Dynamic section at offset 0x840 contains 31 entries:
      Tag        Type                         Name/Value
     0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/./__main__shared_libs_symlink_tree]
     0x0000000000000001 (NEEDED)             Shared library: [lib_foo.so]
     0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
     0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
     0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
     0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
     ...[cut]...
    

    which informed ld.so that lib_foo.so was needed before any of the other dynamic dependencies, and also told it where to look to find lib_foo.so, namely:

    (RUNPATH)            Library runpath: [$ORIGIN/./__main__shared_libs_symlink_tree]
    

    where indeed we find a symlink2:

    $ ls -l ./buck-out/v2/gen/root/904931f735703749/__main__/__main__shared_libs_symlink_tree
    total 0
    lrwxrwxrwx 1 imk imk 24 Apr 22 11:49 lib_foo.so -> ../../__foo__/lib_foo.so
    

    to the actual shared library:

    ./buck-out/v2/gen/root/904931f735703749/__foo__/lib_foo.so
    

    The uncalled function void goodbye_world():

    $ readelf -W --syms ./buck-out/v2/gen/root/904931f735703749/__main__/main | \
        c++filt | egrep '(Symbol table|Ndx|goodbye_world)'
    Symbol table '.dynsym' contains 8 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    Symbol table '.symtab' contains 32 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    

    does not appear in either symbol table.

    And as for the static library that link_whole = True might apply to:

    $ find . -name lib*.a; echo Done
    Done
    

    it doesn't exist. Let's look at the actual linkage arguments:

    $ cat ./buck-out/v2/gen/root/904931f735703749/__main__/main.linker.argsfile;
    "-fuse-ld=lld"
    -o
    buck-out/v2/gen/root/904931f735703749/__main__/main
    "-Wl,-rpath,\$ORIGIN/./__main__shared_libs_symlink_tree"
    buck-out/v2/gen/root/904931f735703749/__main__/__objects__/main.cpp.pic.o
    buck-out/v2/gen/root/904931f735703749/__foo__/lib_foo.so
    

    lib_foo.so is linked, i.e. the NEEDED note was written into main ; its runtime path (-rpath) was also written in main; --whole-archive is absent.

    Build take #2. The cxx_library preferred_linkage option

    Now let's change the build to request that libfoo is built as libfoo.a.

    cxx_library(
        name = "foo",
        srcs = ["foo.cpp"],
        preferred_linkage = "static", # New
        link_whole = True,
    )
    

    Clean and rebuild:

    $ buck2 clean
    ...
    $ buck2 build //...
    ...
    BUILD SUCCEEDED
    

    The program runs as before:

    $ ./buck-out/v2/gen/root/904931f735703749/__main__/main
    Hello World
    

    But:

    $ find . -name lib*.so; echo Done
    Done
    

    No shared library was built. Instead:

    $ find . -name lib*.a; echo Done
    ./buck-out/v2/tmp/root/904931f735703749/__foo__/archive/libfoo.pic.a
    ./buck-out/v2/gen/root/904931f735703749/__foo__/libfoo.pic.a
    Done
    

    The static library libfoo.pic.a was built, which contains the object file:

    $ ar -t ./buck-out/v2/gen/root/904931f735703749/__foo__/libfoo.pic.a
    foo.cpp.pic.o
    

    in which are defined:

    $ nm -C ./buck-out/v2/gen/root/904931f735703749/__foo__/libfoo.pic.a | egrep '(foo.cpp.pic.o|world)'
    foo.cpp.pic.o:
    0000000000000000 T hello_world()
    0000000000000030 T goodbye_world()
    

    T = defined in the text section of the program. And both definitions were linked into the program:

    $ readelf -W --syms ./buck-out/v2/gen/root/904931f735703749/__main__/main | \
        c++filt | egrep '(Symbol table|Ndx|world)'
    Symbol table '.dynsym' contains 11 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    Symbol table '.symtab' contains 38 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
        32: 0000000000001940    40 FUNC    GLOBAL DEFAULT   14 hello_world()
        37: 0000000000001970    40 FUNC    GLOBAL DEFAULT   14 goodbye_world()
    

    But only in the .symtab, not in the .dynsym: the runtime linker does not need to define them. And the definition of goodbye_world() is dead weight.

    Check out the dynamic section of the new executable:

    $ readelf --dynamic buck-out/v2/gen/root/904931f735703749/__main__/main
    
    Dynamic section at offset 0xa20 contains 29 entries:
      Tag        Type                         Name/Value
     0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
     0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
     0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
     0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
     ...[cut]...
     
    

    It's the same as before except that:

     0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/./__main__shared_libs_symlink_tree]
     0x0000000000000001 (NEEDED)             Shared library: [lib_foo.so]
    

    is now gone. And see the new linkage arguments:

    $ cat ./buck-out/v2/gen/root/904931f735703749/__main__/main.linker.argsfile
    "-fuse-ld=lld"
    -o
    buck-out/v2/gen/root/904931f735703749/__main__/main
    buck-out/v2/gen/root/904931f735703749/__main__/__objects__/main.cpp.pic.o
    -Wl,--whole-archive
    buck-out/v2/gen/root/904931f735703749/__foo__/libfoo.pic.a
    -Wl,--no-whole-archive
    

    libfoo.pic.a is statically linked, --whole-archive libfoo.pic.a --no-whole-archive

    What's with this .pic. sub-extension as in libfoo.pic.a and foo.cpp.pic.o?

    That's hint that the object file libfoo.pic.a(foo.cpp.pic.o) has been compiled as Position Independent Code, compiler option -fPIC, and is suitable for static linkage into a position independent binary, i.e. a shared libary; not just into a program, which doesn't require PIC code. We don't in fact need PIC code for static linkage into our main program; but we've got it anyway. In the next build we'll see that go away.

    Build take #3. The cxx_binary link_style option

    Let's change the build again to say that static libraries are preferred in the linkage of the main program. The BUCK file now has:

    cxx_library(
        name = "foo",
        srcs = ["foo.cpp"],
        link_whole = True,
    )
    
    cxx_binary(
        name = "main",
        srcs = ["main.cpp"],
        link_style = "static", # New
        deps = [
        ':foo',
        ],
    )
    

    with the cxx_library reverted to original.

    Clean and rebuild:

    $ buck2 clean
    ...
    $ buck2 build //...
    ...
    BUILD SUCCEEDED
    

    The program runs as before:

    $ ./buck-out/v2/gen/root/904931f735703749/__main__/main
    Hello World
    

    But:

    $ find . -name lib*.a
    ./buck-out/v2/tmp/root/904931f735703749/__foo__/archive/libfoo.a
    ./buck-out/v2/gen/root/904931f735703749/__foo__/libfoo.a
    

    now we've got the regular libfoo.a rather than libfoo.pic.a, and it contains the regular:

    $ ar -t ./buck-out/v2/gen/root/904931f735703749/__foo__/libfoo.a
    foo.cpp.o
    

    We told buck the main program prefers static libraries; programs don't need PIC code, so buck has ditched the -fPIC compilation. Nothing else is different from Build #2.


    1. But it's possible compile object files with finer granularity than the default, enabling the linker to discard definitions that come in from object files if it finally determines they're not needed, so they never appear in the global symbol table.

    2. $ORIGIN is meaningful to the runtime linker. It means: the directory containing the file in which $ORIGIN is written.