caliper

How to control the exact number of tests to run with caliper


I tried to understand, what is the proper way to control the number of runs: is it the trial or rep? It is confusing: I run the benchmark with --trial 1 and recieve the output:

0% Scenario{vm=java, trial=0, benchmark=SendPublisher} 1002183670.00 ns; Ï=315184.24 ns @ 3 trials

It looks like 3 trials were run. What are that trials? What are reps? I can control the rep value with the options --debug & --debug-reps, but what is the value when running w/o debug? I need to know how many times exactly my tested method was called.


Solution

  • Between Caliper 0.5 and 1.0 a bit of the terminology has changed, but this should explain it for both. Keep in mind than things were a little murky in 0.5, so most of the changes made for 1.0 were to make things clearer and more precise.

    Caliper 0.5

    A single invocation of Caliper is a run. Each run has some number of trials, which is just another iteration of all of the work performed in a run. Within each trial, Caliper executes some number of scenarios. A scenario is the combination of VM, benchmark, etc. The runtime of a scenario is measured by timing the execution of some number of reps, which is the number passed to your benchmark method at runtime. Multiple reps are, of course, necessary because it would be impossible to get precise measurements for a single invocation in a microbenchmark.

    Caliper 1.0

    Caliper 1.0 follows a pretty similar model. A single invocation of Caliper is still a run. Each run consists of some number of trials, but a trial is more precisely defined as an invocation of a scenario measured with an instrument.

    A scenario is roughly defined as what you're measuring (host, VM, benchmark, parameters) and the instrument is what code performs the measurement and how it was configured. The idea being that if a perfectly repeatable benchmark were a function of the form f(x)=y, Caliper would be defined as instrument(scenario)=measurements.

    Within the execution of the runtime instrument (it's similar for others), there is still the same notion of reps, which is the number of iterations passed to the benchmark code. You can't control the rep value directly since each instrument will perform its own calculation to determine what it should be.

    At runtime, Caliper plans its execution by calculating some number of experiments, which is the combination of instrument, benchmark, VM and parameters. Each experiment is run --trials number of times and reported as an individual trial with its own ID.

    How to use the reps parameter

    Traditionally, the best way to use the reps parameter is to include a loop in your benchmark code that looks more or less like:

    for (int i = 0; i < reps; i++) {…}
    

    This is the most direct way to ensure that the number of reps scales linearly with the reported runtime. That is a necessary property because Caliper is attempting to infer the cost of a single, uniform operation based on the aggregate cost of many. If runtime isn't linear with the number of reps, the results will be invalid. This also implies that reps should not be passed directly to the benchmarked code.