I'm playing around with async-profiler and perf and decided to measure kernel activity for DirectByteBuffer
disk IO. Here is the code (written in Scala, but its Java version should be obvious):
val path = Paths.get("/tmp/test")
val buffer = ByteBuffer.allocateDirect(4096 * 4096)
def main(args: Array[String]): Unit = {
var fullReadsCount = 0
while (true) {
var bytesRead = 0
var ch: SeekableByteChannel = null
try {
ch = Files.newByteChannel(path)
while (bytesRead >= 0) {
bytesRead = ch.read(buffer)
buffer.clear()
}
} finally {
if (ch != null)
ch.close()
}
fullReadsCount += 1
if(fullReadsCount % 100 == 0) println(fullReadsCount)
}
}
I ran this code multiple times and did both perf
and async-profiler
and noticed the following result:
async-profiler
$~/profiler.sh -i 28169 -d 30 <pid>
//.... stack traces ommited
ns percent samples top
264788732 61.02% 9317 copy_user_enhanced_fast_string_[k]
41510919 9.57% 1467 generic_file_read_iter_[k]
9333863 2.15% 331 find_get_entry_[k]
4181131 0.96% 148 __radix_tree_lookup_[k]
4057194 0.94% 143 copy_page_to_iter_[k]
1860485 0.43% 63 __d_lookup_rcu_[k]
1610407 0.37% 50 _raw_spin_unlock_irqrestore_[k]
perf sudo perf record -F 31499 -g -p <pid> -- sleep 30
Averagely among all runs I did I noticed that copy_user_enhanced_fast_string
percentage is different in perf
and async-profiler
61.02%
vs 77.65%
QUESTION: Why is the percentage different for copy_user_enhanced_fast_string
sampled by perf
and async-profiler
? I tried to provide the equal condition (Frequency and sampling period and I did not run both of the profiler simulatenously. 31499 Hz ≈ 28169 nanos).
Or my interpretation of the result is wrong?
The selected profiling interval (28 μs) is too short.
Check dmesg
- there are probably kernel warnings like
perf interrupt took too long (18047 > 18000), lowering kernel.perf_event_max_sample_rate to 25000
async-profiler
differs from perf
in the mechanism of processing PMU events. perf
just collects samples in the ring buffer while async-profiler sends a signal to the process in order to invoke application-specific callback. Typically there is no visible difference to a user, but when profiling frequency is too high, signals may introduce additional noise and affect process scheduling.
I suggest to increase profiling interval to at least 100 μs (10000 Hz). This should make measurements more reliable.