So the basic problem is that my built executable is 4GB in size with debug symbols turned on (between 75 MB and 300 MB with no debug symbols and varying optimization levels). How can I diagnose/analyze where all these symbols are coming from, and which are the biggest offenders in terms of taking up space? I have found some questions on reducing the non-debug executable size (though they have not been terribly illuminating), but here I am mainly concerned with reducing the debug symbol clutter. The executable is so large that it takes gdb a significant amount of time to load up all the symbols, which is hindering debugging. Perhaps reducing the code bloat is the fundamental task, but I would first like to know where my 4GB is being spent.
Running the executable through 'size --format=SysV' I get the following output:
section size addr
.interp 28 4194872
.note.ABI-tag 32 4194900
.note.gnu.build-id 36 4194932
.gnu.hash 714296 4194968
.dynsym 2728248 4909264
.dynstr 13214041 7637512
.gnu.version 227354 20851554
.gnu.version_r 528 21078912
.rela.dyn 37680 21079440
.rela.plt 15264 21117120
.init 26 21132384
.plt 10192 21132416
.text 25749232 21142608
.fini 9 46891840
.rodata 3089441 46891872
.eh_frame_hdr 584228 49981316
.eh_frame 2574372 50565544
.gcc_except_table 1514577 53139916
.init_array 2152 56753888
.fini_array 8 56756040
.jcr 8 56756048
.data.rel.ro 332264 56756064
.dynamic 992 57088328
.got 704 57089320
.got.plt 5112 57090048
.data 22720 57095168
.bss 1317872 57117888
.comment 44 0
.debug_aranges 2978704 0
.debug_info 278337429 0
.debug_abbrev 1557345 0
.debug_line 13416850 0
.debug_str 3620467085 0
.debug_loc 236168202 0
.debug_ranges 37473728 0
Total 4242540803
from which I guess we can see that 'debug_str' takes up ~3.6 GB. I don't 100% know what "debug_str" are but I guess they might literally be the string names of the debug symbols? So is this telling me that the de-mangled names of my symbols are just insanely big? How can I figure out which ones and fix them?
I guess I can somehow do something with 'nm', directly inspecting the symbol names, but the output is enormous and I'm not sure how best to search it. Are there any tools to do this kind of analysis?
The compiler used was 'c++ (GCC) 4.9.2'. And I guess I should mention that I am working in a Linux environment.
So I have tracked down the main culprit by doing the following, based mostly on John Zwinck's answer. Essentially I just followed his suggestion to just run "string" on the executable and analyzed the output.
strings my_executable > exec_strings.txt
I then sorted the output mostly following mindriot's method:
cat exec_strings.txt | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2- > exec_strings_sorted.txt
and had a look at the longest strings. Indeed it all seemed to be some insane template bloat, from a particular library. I then did a little more counting like:
cat exec_strings.txt | wc -l
2928189
cat exec_strings.txt | grep <culprit_libname> | wc -l
1108426
to see that of the approximately 3 million strings that are extracted, it seems like ~1 million of them were coming from this library. Finally, doing
cat exec_strings.txt | wc -c
3659369876
cat exec_strings.txt | grep <culprit_libname> | wc -c
3601918899
it became apparent that these million strings are all super long and constitute the great bulk of the debug symbol garbage. So at least now I can focus on this one library while trying to remove the root of the problem.