gcclinkerweak-referencestranslation-unit

Why the weak symbol defined in the same .a file but different .o file is not used as fall back?


I have below tree:

.
├── func1.c
├── func2.c
├── main.c
├── Makefile
├── override.c
└── weak.h

func1.c

#include <stdio.h>

void func2(void);

void func1 (void)
{
    func2();
}

func2.c

#include <stdio.h>

void func2 (void)
{
    printf("in original func2()\n");
}

main.c

#include <stdio.h>

void func1();

void func2();

void main()
{
    func1();
}

override.c

#include <stdio.h>

void func2 (void)
{
    printf("in override func2()\n");
}

weak.h

__attribute__((weak))
void func2 (void); // <==== weak attribute on declaration

Makefile

ALL:
    rm -f *.a *.o
    gcc -c override.c -o override.o
    gcc -c func1.c -o func1.o -include weak.h # weak.h is used to tell func1.c that func2() is weak
    gcc -c func2.c -o func2.o
    ar cr all_weak.a func1.o func2.o
    gcc main.c all_weak.a override.o -o main

All these runs well as below:

in override func2()

But if I remove the override version of func2() from override.c as below:

#include <stdio.h>

// void func2 (void)
// {
//     printf("in override func2()\n");
// }

The build pass but the final binary gives below error at runtime:

Segmentation fault (core dumped)

And in the symbol table of ./main, the func2() is an unresolved weak symbol.

000000000000065b T func1
                 w func2 <=== func2 is a weak symbol with no default implementation

Why didn't it fall back to the func2() in the original func2.c? After all the all_weak.a already contains an implementation in func2.o:

func1.o:
0000000000000000 T func1
                 w func2 <=== func2 is [w]eak with no implementation
                 U _GLOBAL_OFFSET_TABLE_

func2.o:
0000000000000000 T func2   <=========== HERE! a strong symbol!
                 U _GLOBAL_OFFSET_TABLE_
                 U puts

ADD 1

It seems the arrangement of translation unit also affects the fall back to the weak function.

If I put the func2() implementation into the same file/translation unit as func1() as below, the fall back to the original func2() can work.

func1.c

#include <stdio.h>

void func2 (void)
{
    printf("in original func2()\n");
}

void func1 (void)
{
    func2();
}

The symbols of all_weak.a is:

func1.o:
0000000000000013 T func1
0000000000000000 W func2 <==== func2 is still [W]eak but has default imeplementation
                 U _GLOBAL_OFFSET_TABLE_
                 U puts

The code can fall back to the original func2() correctly if no override is provided.

This link also mentioned that the to work with the GCC alias attribute, translation unit arrangement must also be considered.

alias (“target”) The alias attribute causes the declaration to be emitted as an alias for another symbol, which must be specified. For instance,

void __f () { /* Do something. */; } void f () attribute ((weak, alias ("__f"))); defines f to be a weak alias for __f. In C++, the mangled name for the target must be used. It is an error if __f is not defined in the same translation unit.

According to the wikipedia:

The nm command identifies weak symbols in object files, libraries, and executables. On Linux a weak function symbol is marked with "W" if a weak default definition is available, and with "w" if it is not.

ADD 2 - 7:54 PM 8/7/2021

(Huge thanks to @n. 1.8e9-where's-my-share m. )

I tried these:

Now these files look like this:

func2.c

#include <stdio.h>

__attribute__((weak))
void func2 (void)
{
    printf("in original func2()\n");
}

Makefile:

ALL:
    rm -f *.a *.o
    gcc -c override.c -o override.o
    gcc -c func1.c -o func1.o
    gcc -c func2.c -o func2.o
    ar cr all_weak.a func1.o func2.o
    gcc main.c all_weak.a -o main_original   # <=== no override.o
    gcc main.c all_weak.a override.o -o main_override # <=== override.o

The output is this:

xxx@xxx-host:~/weak_fallback$ ./main_original 
in original func2() <===== successful fall back

xxx@xxx-host:~/weak_fallback$ ./main_override
in override func2() <===== successful override

So, the conclusion is:

And some quotation from here:

The linker will only search through libraries to resolve a reference if it cannot resolve that reference after searching all input objects. If required, the libraries are searched from left to right according to their position on the linker command line. Objects within the library will be searched by the order in which they were archived. As soon as armlink finds a symbol match for the reference, the searching is finished, even if it matches a weak definition. The ELF ABI section 4.6.1.2 says: "A weak definition does not change the rules by which object files are selected from libraries. However, if a link set contains both a weak definition and a non-weak definition, the non-weak definition will always be used." The "link set" is the set of objects that have been loaded by the linker. It does not include objects from libraries that are not required. Therefore archiving two objects where one contains the weak definition of a given symbol and the other contains the non-weak definition of that symbol, into a library or separate libraries, is not recommended.

ADD 3 - 8:47 AM 8/8/2021

As @n.1.8e9-where's-my-sharem commented:

Comment 1:

"weak" on a symbol which is not a definition means "do not resolve this symbol at link time". The linker happily obeys.

Comment 2:

"on a symbol which is not a definition" is wrong, should read "on an undefined symbol".

I think by "on an undefined symbol", he means "an undefined symbol within current translation unit". In my case, when I:

These essentially tell the linker do not resolve the func2() consumed in the translation unit func1.c. But it seems this "do not" only applies to .a file. If I link another .o file besides the .a file, the linker is still willing to resolve the func2(). Or if the func2() is also defined in the func1.c, linker will also resolve it. Subtle it is!

(So far, all these conclusions are based on my experiment result. It's subtle to summarize all these. If anyone can find some authoritative source, please feel free to comment or reply. Thanks!)

(Thanks to n. 1.8e9-where's-my-share m.'s comment.)

And a related thread:

Override a function call in C

Some afterthought - 9:55 PM 8/8/2021

There's no rocket science behind these subtle behaviors. It just depends on how the linker is implemented. Sometimes document is vague. You have to try it and deal with it. (If there's some big idea behind all these, please correct me and I will be more than grateful.)


Solution

  • these subtle behaviors

    There isn't really anything subtle here.

    1. A weak definition means: use this symbol unless another strong definition is also present, in which case use the other symbol.

      Normally two same-named symbols result in a multiply-defined link error, but when all but one definitions are weak, no multiply-defined error is produced.

    2. A weak (unresolved) reference means: don't consider this symbol when deciding whether to pull an object which defines this symbol out of archive library or not (an object may still be pulled in if it satisfies a different strong undefined symbol).

      Normally if the symbol is unresolved after all objects are selected, the linker will report unresolved symbol error. But if the unresolved symbol is weak, the error is suppressed.

    That's really all there is to it.

    Update:

    You are repeating incorrect understanding in comments.

    What makes me feel subtle is, for a weak reference, the linker doesn't pull an object from an archive library, but still check a standalone object file.

    This is entirely consistent with the answer above. When a linker deals with archive library, it has to make a decision: to select contained foo.o into the link or not. It is that decision that is affected by the type of reference.

    When bar.o is given on the link line as a "standalone object file", the linker makes no decisions about it -- bar.o will be selected into the link.

    And if that object happens to contain a definition for the weak reference, will the weak reference be also resolved by the way?

    Yes.

    Even the weak attribute tells the linker not to.

    This is the apparent root of misunderstanding: the weak attribute doesn't tell the linker not to resolve the reference; it only tells the linker (pardon repetition) "don't consider this symbol when deciding whether to pull an object which defines this symbol out of archive library".

    I think it's all about whether or not an object containing a definition for that weak reference is pulled in for linking.

    Correct.

    Be it a standalone object or from an archive lib.

    Wrong: a standalone object is always selected into the link.