Does anyone know what function / mechanism is dtrace
using for tracking malloc
s? I'm trying to profile a piece of code, which I can do with the aid of debugger and some command line scripting, i.e.:
sudo dtrace -n "pid`pgrep Mail | head -n 1`::malloc:entry { @sizes=quantize(arg0); }"
Gives me something like:
dtrace: description 'pid31411::malloc:entry ' matched 4 probes
^C
value ------------- Distribution ------------- count
-1 | 0
0 | 214
1 | 7
2 | 191
4 | 1054
8 |@@@@ 15992
16 |@@@@@@@@@@@@ 44569
32 |@@@@@@@@@@ 37003
64 |@@@@ 15426
128 |@@@@ 15695
256 |@ 2616
512 |@ 1967
1024 |@ 1891
2048 |@@ 6010
4096 | 523
8192 | 43
16384 | 110
32768 | 19
65536 | 0
131072 | 69
262144 | 0
But this is really tedious for me. I was wondering how to do this programmatically, from within the code.
I think you're viewing the problem the wrong way around. Your example shows a fairly sophisticated interpretation of an arbitrary argument in an arbitrary combination of process and function — being able to do that in a single line and without modifying your own program is extraordinarily powerful. Attempting to have your own code perform the same analysis makes no sense: what would you do if, e.g., you wanted a linear scale instead of a logarithmic one? Reimplement lquantize(), too?
Focus on writing the code you want and let DTrace do the profiling.
EDIT in response to the first comment.
The execution path for the example you give is extremely circuitous. Very broadly, dtrace(1)
requests that the kernel modify malloc
's prologue so that, on entry, a calling thread traps to the DTrace kernel module. There, the datum is aggregated within a per-cpu buffer before control is returned to the instrumented thread. At periodic intervals, the dtrace
process requests, via libdtrace
, a snapshot of the kernel's per-CPU buffers via ioctl(2)
. Coalescing these buffers and then rendering the graph that you see are also functions performed by libdtrace
. On macos, the libdtrace
API, which includes the format of the records exchanged with the kernel, is private. Thus, reusing any of this infrastructure for even your simple example would be "using a sledgehammer to crack a nut".
A further consideration is that you'll be adding code that will itself need to be debugged and maintained. If your code is sufficiently complex that it warrants its own instrumentation then it seems plausible that, one day, you will want to consider calloc()
, realloc()
and mmap()
. Perhaps you will also want to explicitly include or exclude calls to these functions from not just your own code but other libraries against which it is linked.
Finally, it will almost always be preferable to separate the code that implements your actual task from the code used to debug it. One example approach would be to write your own, instrumented wrapper for malloc()
and put it in a shared object that you can interpose between your executable and, presumably, libc
.