c++compiler-optimizationdebug-symbols

Reducing the footprint of debug symbols (executable is bloated to 4 GB)


So the basic problem is that my built executable is 4GB in size with debug symbols turned on (between 75 MB and 300 MB with no debug symbols and varying optimization levels). How can I diagnose/analyze where all these symbols are coming from, and which are the biggest offenders in terms of taking up space? I have found some questions on reducing the non-debug executable size (though they have not been terribly illuminating), but here I am mainly concerned with reducing the debug symbol clutter. The executable is so large that it takes gdb a significant amount of time to load up all the symbols, which is hindering debugging. Perhaps reducing the code bloat is the fundamental task, but I would first like to know where my 4GB is being spent.

Running the executable through 'size --format=SysV' I get the following output:

section                    size       addr
.interp                      28    4194872
.note.ABI-tag                32    4194900
.note.gnu.build-id           36    4194932
.gnu.hash                714296    4194968
.dynsym                 2728248    4909264
.dynstr                13214041    7637512
.gnu.version             227354   20851554
.gnu.version_r              528   21078912
.rela.dyn                 37680   21079440
.rela.plt                 15264   21117120
.init                        26   21132384
.plt                      10192   21132416
.text                  25749232   21142608
.fini                         9   46891840
.rodata                 3089441   46891872
.eh_frame_hdr            584228   49981316
.eh_frame               2574372   50565544
.gcc_except_table       1514577   53139916
.init_array                2152   56753888
.fini_array                   8   56756040
.jcr                          8   56756048
.data.rel.ro             332264   56756064
.dynamic                    992   57088328
.got                        704   57089320
.got.plt                   5112   57090048
.data                     22720   57095168
.bss                    1317872   57117888
.comment                     44          0
.debug_aranges          2978704          0
.debug_info           278337429          0
.debug_abbrev           1557345          0
.debug_line            13416850          0
.debug_str           3620467085          0
.debug_loc            236168202          0
.debug_ranges          37473728          0
Total                4242540803

from which I guess we can see that 'debug_str' takes up ~3.6 GB. I don't 100% know what "debug_str" are but I guess they might literally be the string names of the debug symbols? So is this telling me that the de-mangled names of my symbols are just insanely big? How can I figure out which ones and fix them?

I guess I can somehow do something with 'nm', directly inspecting the symbol names, but the output is enormous and I'm not sure how best to search it. Are there any tools to do this kind of analysis?

The compiler used was 'c++ (GCC) 4.9.2'. And I guess I should mention that I am working in a Linux environment.


Solution

  • So I have tracked down the main culprit by doing the following, based mostly on John Zwinck's answer. Essentially I just followed his suggestion to just run "string" on the executable and analyzed the output.

    strings my_executable > exec_strings.txt
    

    I then sorted the output mostly following mindriot's method:

    cat exec_strings.txt | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2- > exec_strings_sorted.txt
    

    and had a look at the longest strings. Indeed it all seemed to be some insane template bloat, from a particular library. I then did a little more counting like:

    cat exec_strings.txt | wc -l
    2928189
    cat exec_strings.txt | grep <culprit_libname> | wc -l
    1108426
    

    to see that of the approximately 3 million strings that are extracted, it seems like ~1 million of them were coming from this library. Finally, doing

    cat exec_strings.txt | wc -c
    3659369876
    cat exec_strings.txt | grep <culprit_libname> | wc -c
    3601918899
    

    it became apparent that these million strings are all super long and constitute the great bulk of the debug symbol garbage. So at least now I can focus on this one library while trying to remove the root of the problem.