This is my code. I can totally understand that benchIfAndSwitch
is faster than benchSiwtch
because “branch prediction”, but why is benchEnum
not the fastest one? There is no if
or switch
statementtt.
public class TestBenchMarks {
public enum ChannelState {
CONNECTED, DISCONNECTED, SENT, RECEIVED, CAUGHT
}
@State(Scope.Benchmark)
public static class ExecutionPlan {
@Param({ "100000" })
public int size;
public ChannelState[] states = null;
public ChannelState2[] states2 = null;
@Setup
public void setUp() {
ChannelState[] values = ChannelState.values();
states = new ChannelState[size];
Random random = new Random(new Date().getTime());
for (int i = 0; i < 100000; i++) {
int nextInt = random.nextInt(1000000);
if (nextInt > 100) {
states[i] = ChannelState.CONNECTED;
} else {
states[i] = values[nextInt % values.length];
}
}
ChannelState2[] values2 = ChannelState2.values();
states2 = new ChannelState2[size];
for (int i = 0; i < 100000; i++) {
if (i > 100) {
states2[i] = ChannelState2.CONNECTED;
} else {
states2[i] = values2[i % values2.length];
}
}
}
}
@Fork(value = 5)
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void benchSiwtch(ExecutionPlan plan, Blackhole bh) {
int result = 0;
for (int i = 0; i < plan.size; ++i) {
switch (plan.states[i]) {
case CONNECTED:
result += 1;
break;
case DISCONNECTED:
result += 1;
break;
case SENT:
result += 1;
break;
case RECEIVED:
result += 1;
break;
case CAUGHT:
result += 1;
break;
}
}
bh.consume(result);
}
@Fork(value = 5)
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void benchIfAndSwitch(ExecutionPlan plan, Blackhole bh) {
int result = 0;
for (int i = 0; i < plan.size; ++i) {
ChannelState state = plan.states[i];
if (state == ChannelState.CONNECTED) {
result += 1;
} else {
switch (state) {
case RECEIVED:
result += 1;
break;
case SENT:
result += 1;
break;
case DISCONNECTED:
result += 1;
break;
case CAUGHT:
result += 1;
break;
}
}
}
bh.consume(result);
}
@Fork(value = 5)
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void benchEnum(ExecutionPlan plan, Blackhole bh) {
int result = 0;
for (int i = 0; i < plan.size; ++i) {
plan.states2[i].handle(result);
}
bh.consume(result);
}
public enum ChannelState2 {
/**
* CONNECTED
*/
CONNECTED((int a) -> {
try {
a = a + 1;
} catch (Exception e) {
e.printStackTrace();
}
}),
/**
* DISCONNECTED
*/
DISCONNECTED((int a) -> {
try {
a = a + 1;
} catch (Exception e) {
e.printStackTrace();
}
}),
/**
* SENT
*/
SENT((int a) -> {
try {
a = a + 1;
} catch (Exception e) {
e.printStackTrace();
}
}),
/**
* RECEIVED
*/
RECEIVED((int a) -> {
try {
a = a + 1;
} catch (Exception e) {
e.printStackTrace();
}
}),
/**
* CAUGHT
*/
CAUGHT((int a) -> {
try {
a = a + 1;
} catch (Exception e) {
e.printStackTrace();
}
});
private final HandlerFunction function;
ChannelState2(HandlerFunction handleFunction) {
this.function = handleFunction;
}
private void handle(int a) {
function.handle(a);
}
}
private interface HandlerFunction {
void handle(int a);
}
}
this is the my result
and this is my result after remove try-catch from benchEnum
thanks for everyone's help!!!!!!
Good job on using JMH for this. I do invite you to edit your question and include your benchmark results for future reference.
but why is benchEnum not the fastest one? There is no if or switch statementtt.
This statement implies that method calls are free relative to some sort of grievous expense incurred by if
and switch
.
That's obviously false. In fact, it is quite the opposite! Method calls should be assumed to be an order of magnitude or so (perhaps that's slightly overstating matters) as more expensive than an 'if' or 'switch'. Yes, for crucial method calls that aren't inherently doomed to multiple layers of indirection1, hotspot will eliminate most of that efficiency, but no matter where you end up, it's not likely to end up being more efficient than a switch/if.
In the end, a switch
/ if
, optimally, at the machine code level, is either literally just a check that fails ('if X is true, jump elsewhere', and X is not true, so the only time that was spent is checking that X is not true), or one that succeeds that results in a single jump. The jump at the end of the case
block / the if
body is eliminated because that can just be shortcutted to 'restart the loop that contains this if
/switch
construct', which is a cost we're always going to have to pay no matter what we do2).
Contrast to a method call, which is, under not-optimal operation here:
ChannelState2.handle
lives. This requires scanning a lookup table that links method signatures for the specific type to the place the actual code that contains its implementation lives. That's one expensive lookup + 1 jump.handle
which... immediately needs to grab the contents of a field (field function
), do a dynamic dispatch on that to find out where the code for its apply
function lives, and then jump there. That's a field fetch + an expensive lookup + a jump.That's.. a lot. Hotspot can easily eliminate that first expensive lookup because enums are final. It can even eliminate that first jump via inlining. But that field fetch + expensive lookup + jump inherent in calling the lambda function is a lot more difficult. The expensive lookup part in particular is hard to eliminate here.
'expensive' is a relative term. It's not usually so expensive that it's a concern. However, you're comparing it to the time taken by the JMP
instructions caused by hopping through a switch block which are infinitesemal. In contrast to that tiny cost, invoking a methodhandle (which is what code is going to end up running as) is much more expensive.
It's not surprising at all that the method-call based one is slower than the others. The others end up invoking .ordinal()
in the bytecode, but hotspot can eliminate that and probably is, which means those switches end up being a literal 'depending on the number value of this word sized block, jump to one of a few disparate code blocks' which requires no visits to memory whatsoever.
[1] Imagine a method that accepts, say, a java.util.List
and ends up invoking .get()
on this argument, but you call this method repeatedly handing it instances of a cavalcade of different actual types (List is an interface, the object you hand to this method must therefore be an instance of some subtype of it), each type having a different get
implementation. Here the system is going to have to somehow do some sort of lookup somewhere to figure out which get
method to call, because there are loads of different ones. That's what I mean with 'doomed to a layer of indirection' - it's intrinsic to this code flow and thus a hotspot compilation step can't eliminate it. In contrast, if every call you hand an instance of ArrayList
(and not some subtype of it, or at least not something that overrides get
), then that can be optimized. Not at first; ArrayList
is not final after all, but a hotspot compiler can, and generally will, use presumptive compilation: It will assume that you are always going to hand it an ArrayList instance and will take the layer of indirection out, jumping directly to ArrayList
's get
method, and will add some bells and whistles to the system so that if you ever hand something else, the presumptively hotspotted code path is invalidated (because it was designed to presume an assumption that has now ceased to be true, thus, the code is no longer valid). This is how hotspot is meant to work: Make assumptions based on observed behaviour and simply invalidate the hotspotted code if one of the assumptions fails; assuming that code is still in the 'hot path' it'll be hotspotted again, now with more information, soon.
[2] There's loop unrolling, but, that doesn't apply here, you're looping far too often. The existence of these constructs mostly eliminate loop unroll as an option.