javaperformancebenchmarkingjmh

JMH - Lower-level benchmarking


I'm using JMH to benchmark JUnit tests.

My benchmarks:

import org.openjdk.jmh.annotations.*;

public class Benchmarks {
        @Benchmark
        public void bmUnitTest1() {
                UnitTests.UnitTest1();
        }

        @Benchmark
        public void bmUnitTest2() {
                UnitTests.UnitTest2();
        }
}

My benchmark runner:

import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.results.format.ResultFormatType;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.TimeValue;

import java.util.concurrent.TimeUnit;

public class BenchmarkRunner {
    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
                .include(Benchmarks.class.getSimpleName())
                .mode(Mode.SingleShotTime)
                .resultFormat(ResultFormatType.CSV)
                .result("target/test-classes/benchmarkcsv/BM " + System.currentTimeMillis() + ".csv")
                .timeUnit(TimeUnit.MILLISECONDS)
                .warmupIterations(3)
                .warmupTime(TimeValue.seconds(1))
                .measurementIterations(3)
                .measurementTime(TimeValue.seconds(1))
                .timeout(TimeValue.seconds(5))
                .forks(1)
                .warmupForks(1)
                .threads(1)
                .build();

        new Runner(opt).run();
    }
}

Currently I'm getting performance metrics on the whole unit test. I was wondering if it's possible to see these performance metrics on a lower level.


Example:

UnitTest1 has multiple function calls. Function1 takes 10ms, Function2 takes 20ms.

Instead of seeing this (simplified):

UnitTest1: 30ms/op

I'd like to see this:

UnitTest1: 30ms/op
        Function1: 10ms/op
        Function2: 20ms/op

Can I do this in JMH? If not, are there any alternatives I could use to benchmark JUnit tests?


Solution

  • JMH has several profilers that can be enabled while running benchmarks. See JMHSample_35_Profilers for details.

    One of them is async profiler based on async-profiler project. It can do exactly what you want - show which methods called by your unit test take most CPU time (with stack traces!)

    To enable this profiler, call OptionsBuilder.addProfiler(AsyncProfiler.class, "output=flamegraph"), or (if running from a command line) add -prof async:output=flamegraph option.

    When benchmarking completes, JMH will generate a profiler report in .html format:

    JMH profile

    If you want to get a flat profile - just methods with the percentage of CPU time spent - use async profiler without output option:

              ns  percent  samples  top
      ----------  -------  -------  ---
      1992741308   39.53%      200  java.util.stream.AbstractPipeline.copyInto
       844434060   16.75%       84  java.util.stream.StreamOpFlag.isKnown
       670418795   13.30%       67  java.lang.Integer.max
       500609106    9.93%       50  java.util.stream.AbstractPipeline.getStreamAndOpFlags
       451255844    8.95%       45  java.util.stream.AbstractPipeline.evaluate
       221004309    4.38%       22  java.util.stream.ReduceOps$7.makeSink
       140062540    2.78%       14  java.util.Spliterators$IntArraySpliterator.forEachRemaining
        60012022    1.19%        6  java.util.stream.AbstractPipeline.wrapSink
        39966227    0.79%        4  java.util.stream.ReduceOps$6ReducingSink.accept
        39348309    0.78%        4  java.util.stream.ReduceOps$ReduceOp.evaluateSequential
        10862017    0.22%        1  Dictionary::find(unsigned int, Symbol*, Handle)
        10637320    0.21%        1  ciVirtualCallTypeData::translate_from(ProfileData const*)
        10209263    0.20%        1  java.util.stream.AbstractPipeline.<init>
        10155044    0.20%        1  java.util.Spliterator$OfInt.forEachRemaining
        10128043    0.20%        1  PhaseChaitin::Register_Allocate()
        10115108    0.20%        1  _raw_spin_unlock_irqrestore_[k]
         9486501    0.19%        1  PhaseChaitin::Select()
         9281194    0.18%        1  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)