I try to build a debugger which allows me to set breakpoints at functions or codelines. The needed debug information should be extracted from the DWARF section from an elf file. I am able to extract these data. The project I want to debug has 50-100 files, so I need about 10 min to parse the elf with readelf or pyelftools for all the dwarf infos I need. To increase speed, my next approach was to only parse for the debug infos of the currently opend source file. But it also takes a few minutes using pyelftools.
How do debuggers get the informations so fast? I use an iSystem debugger with winIDEA and it takes about 20sec. to flash the elf and afterwards I am instantly able to set breakpoints in any source file.
I am new to the topic so any help is appreciated.
EDIT: This is how I use pyelftools to get function addresses from one file
def main():
dwarfinfo = elffile.get_dwarf_info()
for CU in dwarfinfo.iter_CUs():
top_DIE = CU.get_top_DIE()
if FILENAME in top_DIE.get_full_path():
die_info_rec(top_DIE)
return
def die_info_rec(die):
if "subprogram" in die.tag:
# Function found, get data
return
How do debuggers get the informations so fast?
By reading only the info they need (DWARF
format is structured such that you can efficiently skip over translation units and functions you are not interested in), and by doing it in C
.
I need about 10 min to parse the elf with readelf or pyelftools
That is likely significant part of your problem: parsing readelf
output is probably 100 to 1000 times less efficient than reading the info directly.
pyelftools
does appear to provide an API to iterate over compilation units, and in theory should be able to provide efficient access.
You didn't show how you are using it, you may not be doing that efficiently.
Even then, pyelftools
is implemented in pure Python, so likely is at least 10 times slower than something like libdwarf
.