During evaluation of memory performance of Power8 processor using perf
I ended up with problem of understanding difference between events PM_DATA_ALL_*
and PM_DATA_*
. Most of the counters exists in both version, but the description in oprofile documentation and in papi_native_avail
are the same, for example:
PM_DATA_FROM_LMEM
The processor's data cache was reloaded from the local chip's Memory due to either only demand loads or demand loads plus prefetches if MMCR1[16] is 1.
I though I will figure out the difference by measuring some data. If I provide task large enough, I can observe expected difference that *_ALL
versions have higher values. I understand the concept of multiplexing counters in the measure using perf
.
So what is actually the all in these events?
After few more hours of searching, I found another source directly from IBM describing the events as:
PM_DATA_ALL_FROM_LMEM
The processor's data cache was reloaded from the local chip's Memory due to either demand loads or data prefetch
and
PM_DATA_FROM_LMEM
The processor's data cache was reloaded from the local chip's Memory due to a demand load
So the difference makes prefetch load, which is not included in the second version.
The PAPI and perf tools just include wrong description. These events were contributed directly to oprofile
by IBM but probably with some mistakes/inaccuracies. As I browse through the PAPI/libpfm source, I see that the correct description is in .pme_short_desc
field, but the .pme_long_desc
fields are both the same. And papi_native_avail
reports only the long one:
Thanks for patience. Summing the stuff like this helped me a lot and I hope it will help somebody struggling with similar issues.