I have a Google Benchmark such as the following.
#include "benchmark/benchmark.h"
#include <cstring>
static void bench_memset(benchmark::State& state) {
char buffer[16];
for(auto _ : state) {
memset(buffer, '\0', 16);
benchmark::ClobberMemory();
}
}
BENCHMARK(bench_memset);
BENCHMARK_MAIN();
And I run it with the following command.
./my_benchmark --benchmark_perf_counters=BRANCH-MISSES,CACHE-MISSES,CACHE-REFERENCES --benchmark_counters_tabular=true
Which results in the following data.
---------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations BRANCH-MISSES CACHE-MISSES CACHE-REFERENCES
---------------------------------------------------------------------------------------------------
bench_memset 0.611 ns 0.611 ns 1000000000 7n 1000p 52n
I generally understand the concept of branch and cache misses, but I don't understand the meaning of the 'n' and 'p' that are printed after the perf counter metrics.
I searched both Google Benchmark's documentation and Perf's documentation, but neither seem to mention this.
I also notice that if I bump up the size of my workload,
static void bench_memset(benchmark::State& state) {
char buffer[4096];
for(auto _ : state) {
memset(buffer, '\0', 4096);
benchmark::ClobberMemory();
}
}
Then the 'n' will change to 'u', which is also mysterious.
---------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations BRANCH-MISSES CACHE-MISSES CACHE-REFERENCES
---------------------------------------------------------------------------------------------------
bench_memset 96.0 ns 96.0 ns 6959398 1.14952u 0 81.6163u
I've also noticed with other benchmarks that there may be no letter at all.
What do these letters stand for?
--benchmark_perf_counters
lists additional perf counters to collect, in libpfm format. Thus, the information about u
, p
and n
can be found in the manual of the perfmon2 project. It shows hardware counters, and Benchmarks outputs fractions of an overall test duration:
In other words, these values are probabilities.
IMHO It would be better if they show just counters, fractions could be easy computed like 7 / 1e9 or 568 / 6959398.