cstatic-analysisdead-code

Identifying dead code in large code repository


I have a large C code base, with >100 binaries, >3000 files and > 30 libraries. There is a lot of dead code that was accumulated and I'm looking for ways to identify and remove that code. The code is simple - no complex macros and (very little) automatically generated code (lex/bison/...).

To identify "static" dead code (and variables) gcc does a good job (using -Wunused-* options identifies all unused static variables, static functions, ...). My challenge is with non-static global functions and variables (and the code base has lot of them!)

I've lot of mileage using 'nm' across all the objects files, practically create a list of all defined global symbols (types 'T', 'D' and 'B' for code, data and uninitialized). I then removed every 'U' symbols. That process identified all unreferenced global. At this point, I have to manually make each symbol static, compile with gcc -Werror -Wunused, and see if it raises any error.

# Omitting some details for brevity.
nm --undefined-only lib1.a lib2.a ... obj1 obj2.o obj3.o | sort > refs.txt
nm --extern-only --defined-only lib1.a lib2.a ... obj1 obj2.o obj3.o  | sort > defs.txt
join -12 -23 -v2 refs.txt defs.txt

My question - is it possible to use "nm" (or other object analysis tool like objdump) to identify which global symbols in object file are also used inside the same object. This will speed up the dead code elimination by separating dead code in global function from global functions that are actually used (but may become static).

Alternatively, is there any other existing tool that will do the job?


Solution

  • I suggest to use GNU ld's dead symbol removal functionality for this.

    For this you need to compile your code with -fdata-sections -ffunction-sections and then link with -Wl,--gc-sections -Wl,--print-gc-sections flags. It will print information about functions which have been removed.

    Here is an example for sample program

    /usr/bin/ld: removing unused section '.text.foo' in file '/tmp/ccXZWJ2X.o'
    

    (.text.foo is section generated for unused function foo).

    As a side note, if you use these options there may be no need to manually sanitize your codebase (apart from making it cleaner) because the toolchain will remove dead code automatically.