valgrindgem5checkpoint

Running valgrind after restoring from a gem5 checkpoint


We are working on a project in gem5, and suspect that there exists a memory leakage in a new memory object that we have implemented. Typically, this is easy enough... launch a compiled gem5.debug binary with valgrind --leak-check=full. Unfortunately, this memory object doesn't do anything of consequence until the memory mode switches from Atomic to Timing (i.e., after fast forwarding and restoring from a checkpoint with a different CPU model).

When we run the command: valgrind --leak-check=full --log-file=valgrind-out.txt --track-orgins=yes build/<ISA>/gem5.debug -d /path/to/outdir /path/to/python/config.py --checkpoint-restore=1 --other-options...

We get the following output (which occurs after many of the gem5 objects have been created):

build/ARM/base/statistics.hh:277: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.1.0.0
gem5 compiled Mar 15 2022 15:27:16
gem5 started Mar 15 2022 17:10:53
gem5 executing on <host name>, pid <pid>
command line: build/ARM/gem5.debug -d /path/to/outdir /path/to/python/config.py --other-options... --checkpoint-restore=1 --checkpoint-dir /path/to/checkpoint

warn: iobus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: bridge.master is deprecated. `master` is now called `mem_side_port`
warn: membus.master is deprecated. `master` is now called `mem_side_ports`
warn: bridge.slave is deprecated. `slave` is now called `cpu_side_port`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: iobus.master is deprecated. `master` is now called `mem_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
debug.sh: line 13: 169858 Bus error               valgrind --leak-check=full --log-file=valgrind-out.txt --track-origins=yes build/ARM/gem5.debug -d /path/to/outdir /path/to/python/config.py --other-options... --checkpoint-restore=1 --checkpoint-dir /path/to/checkpoint

We still find that this program can launch without valgrind, so we have strong reason to believe that it is valgrind that is the source of the issue. We also know that valgrind works with gem5 when starting the simulation from the beginning (no checkpoint).

So, our question is whether or not there a way to utilize valgrind when restoring a gem5 program from a checkpoint, or are the two at odds with each other?


Solution

  • From How does valgrind work?, valgrind pre-processes and modifies the application before it is run. This would mean that valgrind expects certain pre-allocated pointers in certain locations. Furthermore, when looking at the valgrind output, you may see the following: Warning: set address range perms: large range [start, end) (undefined). It is likely the case that large gem5 objects have overwritten valgrind metadata to corrupt existing pointers, which would result in a bus error.

    For future reference, we successfully used LibLeak, which has very clear documentation, is easy to use with gem5, and really low runtime overhead. It also helped us successfully find the memory leak :-)