I am after some suggestions as to how to go about debugging a significant problem that I cannot reduce to a minimal example.
The problem: I compile my application which links to a number of different libraries. The flags include:
-static-libstdc++ -static-libgcc -pipe -std=c++1z -fno-PIC -flto=10 -m64 -O3 -flto=10 -fuse-linker-plugin -fuse-ld=gold -UNDEBUG -lrt -ldl
The compiler is gcc-7.3.0, compiled against binutils-2.30. Boost is compiled with the same flags as the rest of the program, and linked statically.
When the program is linked, I get various warnings about relocation refers to discarded section, both in my own code, and in boost. For instance:
/tmp/ccq2Ddku.ltrans13.ltrans.o:<artificial>:function boost::system::(anonymous namespace)::generic_error_category::message(int) const: warning: relocation refers to discarded section
Then when I run the program, it segfaults on destruction, with the backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff7345a49 in __run_exit_handlers () from /lib64/libc.so.6
#2 0x00007ffff7345a95 in exit () from /lib64/libc.so.6
#3 0x00007ffff732eb3c in __libc_start_main () from /lib64/libc.so.6
#4 0x000000000049b3e3 in _start ()
The function pointer attempting to be called is 0x0.
If I remove using static-libstdc++, the linker warnings and runtime segfault go away.
If I change from c++1z to c++14, the linker warnings and runtime segfault go away.
If I remove -flto, the linker warnings and runtime segfault go away.
If I add "-g" to the compile flags, the linker warnings and runtime segfault go away.
I have tried asking gold for extra debugging, by specifying -Wl,--debug=all, but it tells me seemingly nothing relevant.
If I try and use a small section of the code that appears relevant, compile and link it separately but to the same boost libraries (ie. attempting to produce minimal example), there are no linker warnings, and the program runs to completion without issues.
Help! What can I do to narrow the problem down?
This warning is usually indicative of an inconsistency in the contents of a COMDAT group between two compilation units. If the compiler emits a COMDAT group G with symbol A defined in one compilation unit, but emits the same group G with symbols A and B defined in a second compilation unit, the linker will keep group G from the first compilation unit and discard group G from the second. Any references to symbol B from outside the group in the second compilation unit will produce this error.
The cause is usually a bug in the compiler, and using -flto makes it that much harder to diagnose. In this case, your second compilation unit is the result of link-time optimization (the *.ltrans.o file name). With LTO, it's quite believable that many of the changes you've mentioned will make the problem go away.
The very latest version of gold on the master branch of the binutils git repo has a new [-Wl,]--debug=plugin
option, which will save a log and all the temporary .ltrans.o files. Having the log and those files, along with all the original input files (which you can get a list of by adding the [-Wl,]-t
option), should help isolate the problem better.
The latest version of gold will also print the symbol referenced by the relocation. For a local symbol, it will show the symbol index; use readelf -s
to get more info about the symbol. For a global symbol, it will show the name; you can add the --no-demangle
option for the exact name.
If it's a local symbol, the problem is almost certainly the compiler. References from outside a comdat group to a local symbol in the group are strictly forbidden.
If it's a global symbol, it could be either a compiler problem or a one-definition rule (ODR) violation in your sources. You'll need to identify the comdat group in the named object file, find its key symbol, then find the object file that provided the definition kept by the linker (the -y option will help), and compare the symbols defined in those groups by the two objects. These steps should help:
(1) Starting from the error message:
b.o(.data+0x0): warning: relocation refers to symbol "two" defined in discarded section
(2) Look for symbol "two" in b.o:
$ readelf -sW b.o | grep two
7: 0000000000000008 0 NOTYPE WEAK DEFAULT 6 two
The next-to-last field ("6") is the section number where "two" is defined.
(3) Verify that section 6 is in fact a comdat group:
$ readelf -SW b.o
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 6] .one PROGBITS 0000000000000000 000058 000018 00 WAG 0 0 1
The "G" in the sh_flags field ("Flg") indicates the section belongs to a comdat group.
(4) Find the comdat group containing the section:
$ readelf -g b.o
COMDAT group section [ 1] `.group' [one] contains 1 sections:
[Index] Name
[ 6] .one
This shows us that section 6 is a member of group section 1.
(5) Find the key symbol for that group:
$ readelf -SW b.o
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 1] .group GROUP 0000000000000000 000040 000008 04 7 8 4
The sh_info field ("Inf") tells us the key symbol is symbol #8, which is "one". (That should match the name shown in brackets in step 4.)
$ readelf -sW b.o
Num: Value Size Type Bind Vis Ndx Name
8: 0000000000000000 0 NOTYPE WEAK DEFAULT 6 one
(6) Now you can add the -y one
option to your link to find which objects provided a definition of "one":
$ gcc -Wl,-y,one ...
a.o: definition of one
b.o: definition of one
The first one listed (a.o) is the one that gold keeps; it will discard all subsequent comdat groups with the same key symbol.
If you use the same techniques to examine the comdat group that defines "one" in a.o, and compare the symbols that belong to that group with those that belong to the group in b.o, that should give you more clues.