[SOLVED] JMH - why do I need Blackhole.consumeCPU()

JMH - why do I need Blackhole.consumeCPU()

I'm trying to understand why it is wise to use Blackhole.consumeCPU()?

Something I found about Blackhole.consumeCPU() on Google

Sometimes when we run run a benchmark across multiple threads we also want to burn some cpu cycles to simulate CPU business when running our code. This can't be a Thread.sleep as we really want to burn cpu. The Blackhole.consumeCPU(long) gives us the capability to do this.

My example code:

    import java.util.concurrent.TimeUnit;

    import org.openjdk.jmh.annotations.Benchmark;
    import org.openjdk.jmh.annotations.BenchmarkMode;
    import org.openjdk.jmh.annotations.Level;
    import org.openjdk.jmh.annotations.Measurement;
    import org.openjdk.jmh.annotations.Mode;
    import org.openjdk.jmh.annotations.OutputTimeUnit;
    import org.openjdk.jmh.annotations.Scope;
    import org.openjdk.jmh.annotations.Setup;
    import org.openjdk.jmh.annotations.State;
    import org.openjdk.jmh.annotations.Warmup;
    import org.openjdk.jmh.infra.Blackhole;
    import org.openjdk.jmh.runner.Runner;
    import org.openjdk.jmh.runner.RunnerException;
    import org.openjdk.jmh.runner.options.Options;
    import org.openjdk.jmh.runner.options.OptionsBuilder;

    @State(Scope.Thread)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    public class StringConcatAvgBenchmark {

    StringBuilder stringBuilder1;
    StringBuilder stringBuilder2;

    StringBuffer stringBuffer1;
    StringBuffer stringBuffer2;

    String string1;
    String string2;

    /*
     * re-initializing the value after every iteration
     */
    @Setup(Level.Iteration)
    public void init() {
        stringBuilder1 = new StringBuilder("foo");
        stringBuilder2 = new StringBuilder("bar");

        stringBuffer1 = new StringBuffer("foo");
        stringBuffer2 = new StringBuffer("bar");

        string1 = new String("foo");
        string2 = new String("bar");

    }

    @Benchmark
    @Warmup(iterations = 10)
    @Measurement(iterations = 100)
    @BenchmarkMode(Mode.AverageTime)
    public StringBuilder stringBuilder() {
        // operation is very thin and so consuming some CPU
        Blackhole.consumeCPU(100);
        return stringBuilder1.append(stringBuilder2);
        // to avoid dead code optimization returning the value
    }

    @Benchmark
    @Warmup(iterations = 10)
    @Measurement(iterations = 100)
    @BenchmarkMode(Mode.AverageTime)
    public StringBuffer stringBuffer() {
        Blackhole.consumeCPU(100);      
        // to avoid dead code optimization returning the value
        return stringBuffer1.append(stringBuffer2);
    }

    @Benchmark
    @Warmup(iterations = 10)
    @Measurement(iterations = 100)
    @BenchmarkMode(Mode.AverageTime)
    public String stringPlus() {
        Blackhole.consumeCPU(100);      
        return string1 + string2;
    }

    @Benchmark
    @Warmup(iterations = 10)
    @Measurement(iterations = 100)
    @BenchmarkMode(Mode.AverageTime)
    public String stringConcat() {
        Blackhole.consumeCPU(100);      
        // to avoid dead code optimization returning the value
        return string1.concat(string2);
    }

    public static void main(String[] args) throws RunnerException {

        Options options = new OptionsBuilder()
                .include(StringConcatAvgBenchmark.class.getSimpleName())
                .threads(1).forks(1).shouldFailOnError(true).shouldDoGC(true)
                .jvmArgs("-server").build();
        new Runner(options).run();
    }
    }

Why are the results of this Benchmark better with the blackhole.consumeCPU(100) ?

UPDATE:

Output with blackhole.consumeCPU(100):

    Benchmark                      Mode  Cnt    Score    Error  Units
    StringBenchmark.stringBuffer   avgt   10  398,843 ± 38,666  ns/op
    StringBenchmark.stringBuilder  avgt   10  387,543 ± 40,087  ns/op
    StringBenchmark.stringConcat   avgt   10  410,256 ± 33,194  ns/op
    StringBenchmark.stringPlus     avgt   10  386,472 ± 21,704  ns/op

Output without blackhole.consumeCPU(100):

    Benchmark                      Mode  Cnt   Score    Error  Units
    StringBenchmark.stringBuffer   avgt   10  51,225 ± 19,254  ns/op
    StringBenchmark.stringBuilder  avgt   10  49,548 ±  4,126  ns/op
    StringBenchmark.stringConcat   avgt   10  50,373 ±  1,408  ns/op
    StringBenchmark.stringPlus     avgt   10  87,942 ±  1,701  ns/op

I think I know now why they used this, because the benchmarks are too quick without some delay.
With blackhole.consumeCPU(100) you can measure each benchmark better and receive more significant results. Is that right ?

Solution

Adding artificial delay would not normally improve the benchmark.

But, there are some cases where the operation you are measuring is contending over some resources, and you need a backoff that only consumes CPU, and hopefully does nothing else. See e.g. the case in : http://shipilev.net/blog/2014/nanotrusting-nanotime/

The benchmark in original question is not such a case, therefore I'd speculate Blackhole.consumeCPU is used there without a good reason, or at least that reason is not called out specifically in the comments. Don't do that.