linkershared-librariesdynamic-library

On linking of shared libraries, are they really final, and if so, why?


I am trying to understand more about linking and shared library.

Ultimately, I wonder if it's possible to add a method to a shared library. For instance, suppose one has a source file a.c, and a library lib.so (without the source file). Let's furthermore assume, for simplicity, that a.c declares a single method, whose name is not present in lib.so. I thought maybe it might be possible to, at linking time, link a.o to lib.so while instructing to create newLib.so, and forcing the linker to export all methods/variable in lib.so to that the newLib.so is now basically lib.so with the added method from a.so.

More generally, if one has some source file depending on a shared library, can one create a single output file (library or executable) that is not dependent on the shared library anymore ? (That is, all the relevant methods/variable from the library would have been exported/linked/inlined to the new executable, hence making the dependency void). If that's not possible, what is technically preventing it ?

A somehow similar question has been asked here: Merge multiple .so shared libraries. One of the reply includes the following text: "If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.: without explaining the technical details. Was it a mistake or does it hold ? If so, how to do it ?


Solution

  • Once you have a shared library libfoo.so the only ways you can use it in the linkage of anything else are:-

    Link a program that dynamically depends on it, e.g.

    $ gcc -o prog bar.o ... -lfoo
    

    Or, link another shared library that dynamically depends on it, e.g.

    $ gcc -shared -o libbar.so bar.o ... -lfoo
    

    In either case the product of the linkage, prog or libbar.so acquires a dynamic dependency on libfoo.so. This means that prog|libfoo.so has information inscribed in it by the linker that instructs the OS loader, at runtime, to find libfoo.so, load it into the address space of the current process and bind the program's references to libfoo's exported symbols to the addresses of their definitions.

    So libfoo.so must continue to exist as well as prog|libbar.so. It is not possible to link libfoo.so with prog|libbar.so in such a way that libfoo.so is physically merged into prog|libbar.so and is no longer a runtime dependency.

    It doesn't matter whether or not you have the source code of the other linkage input files - bar.o ... - that depend on libfoo.so. The only kind of linkage you can do with a shared library is dynamic linkage.

    This is in complete contrast with the linkage of a static library

    You wonder about the statement in this this answer where it says:

    If you have access to either source or object files for both libraries, it is straightforward to compile/link a combined SO from them.

    The author is just observing that if I have source files

    foo_a.c foo_b.c... bar_a.c bar_b.c
    

    which I compile to the corresponding object files:

    foo_a.o foo_b.o... bar_a.o bar_b.o...
    

    or if I simply have those object files. Then as well as - or instead of - linking them into two shared libraries:

    $ gcc -shared -o libfoo.so foo_a.o foo_b.o...
    $ gcc -shared -o libbar.so bar_a.o bar_b.o...
    

    I could link them into one:

    $ gcc -shared -o libfoobar.so foo_a.o foo_b.o... bar_a.o bar_b.o...
    

    which would have no dependency on libfoo.so or libbar.so even if they exist.

    And although that could be straightforward it could also be false. If there is any symbol name that is globally defined in any of foo_a.o foo_b.o... and also globally defined in any of bar_a.o bar_b.o... then it will not matter to the linkage of either libfoo.so or libbar.so (and it need not be dynamically exported by either of them). But the linkage of libfoobar.so will fail for multiple definition of name.

    If we build a shared library libbar.so that depends on libfoo.so and has itself been linked with libfoo.so:

    $ gcc -shared -o libbar.so bar.o ... -lfoo
    

    and we then want to link a program with libbar.so, we can do that in such a way that we don't need to mention its dependency libfoo.so:

    $ gcc -o prog main.o ... -lbar -Wl,-rpath=<path/to/libfoo.so>
    

    See this answer to follow that up. But this doesn't change the fact that libbar.so has a runtime dependency on libfoo.so.

    If that's not possible, what is technically preventing it?

    What technically prevents linking a shared library with some program or shared library targ in a way that physically merges it into targ is that a shared library (like a program) is not the sort of thing that a linker knows how to physically merge into its output file.

    Input files that the linker can physically merge into targ need to have structural properties that guide the linker in doing that merging. That is the structure of object files. They consist of named input sections of object code or data that are tagged with various attributes. Roughly speaking, the linker cuts up the object files into their sections and distributes them into output sections of the output file according to their attributes, and makes binary modifications to the merged result to resolve static symbol references or enable the OS loader to resolve dynamic ones at runtime.

    This is not a reversible process. The linker can't consume a program or shared library and reconstruct the object files from which it was made to merge them again into something else.

    But that's really beside the point. When input files are physically merged into targ, that is called static linkage. When input files are just externally referenced in targ to make the OS loader map them into a process it has launched for targ, that is called dynamic linkage. Technical development has given us a file-format solution to each of these needs: object files for static linkage, shared libraries for dynamic linkage. Neither can be used for the purpose of the other.