[SOLVED] Julia Val{c}() seems slow compared to dictionnary lookup

Julia Val{c}() seems slow compared to dictionnary lookup

I'm still learning Julia's multiple dispatch and value-as-type approach. Instantiating Val{c}() seems about 50times slower than dictionary lookup. After that, dispatch seems 6 times faster than dictionary lookup.

Are these durations expected? Is it possible to speed up the instanciation of Val{c}()?

using BenchmarkTools

rand_n = rand([4,11], 1_000_000)

simple_dict = Dict(4 => 11, 11 => 4)
call_dict(num) = simple_dict[num]

@benchmark call_dict.($rand_n) # 42.113ms

val_type(::Val{4}) = 11
val_type(::Val{11}) = 4

@benchmark Val.($rand_n) # 2.4s

partial_result = Val.(rand_n)
@benchmark val_type.($partial_result) # 7ms

Solution

Tricks like these can be great but they can also take you into dangerous territory. You get a boost when you have only 2 val_type methods; to reprise your results,

julia> rand_n = [4, 11, 4]
3-element Vector{Int64}:
  4
 11
  4

julia> vrand_n = Val.(rand_n)
3-element Vector{Val}:
 Val{4}()
 Val{11}()
 Val{4}()

julia> val_type(::Val{4}) = 11
val_type (generic function with 1 method)

julia> val_type(::Val{11}) = 4
val_type (generic function with 2 methods)

julia> using BenchmarkTools

julia> @btime val_type.($vrand_n);
  28.421 ns (1 allocation: 112 bytes)

But look what happens when you have 5:

julia> val_type(::Val{2}) = 0
val_type (generic function with 3 methods)

julia> val_type(::Val{3}) = 0
val_type (generic function with 4 methods)

julia> val_type(::Val{7}) = 0
val_type (generic function with 5 methods)

julia> @btime val_type.($vrand_n);
  95.008 ns (1 allocation: 112 bytes)

Importantly, I didn't even have to create any such objects to observe the slowdown. Moreover, this is much worse than a fixed version of your Dict-based method:

julia> const simple_dict = Dict(4 => 11, 11 => 4)
Dict{Int64, Int64} with 2 entries:
  4  => 11
  11 => 4

julia> call_dict(num) = simple_dict[num]
call_dict (generic function with 1 method)

julia> @btime call_dict.($rand_n);
  39.674 ns (1 allocation: 112 bytes)

(That const is crucial, see https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-global-variables.)

Why? The key is to look at the type of object you're working with:

julia> eltype(vrand_n)
Val

julia> isconcretetype(eltype(vrand_n))
false

This explains why it can be slow: when your iteration extracts the next element, Julia can't predict the concrete type of the object. So it has to use runtime dispatch, which is essentially a glorified dictionary lookup. Unfortunately, it's one where the comparison of keys is much more complicated than just looking up an Int. So you lose quite a lot of performance.

Why is it so much faster when there are only two methods? Because Julia tries to be really smart, and it checks to see how many there are; if there are 3 or fewer methods, it will generate some optimized code that checks the type in a simple if branch rather than invoking the full machinery of type intersection. You can read more details here.

Newcomers to Julia--once they learn the wonders of specialization and the big runtime performance improvements it delivers--often get excited to try to use the type system for everything, and Val-based dispatch is often a tool they reach for. But inferrability is a key component to the speed advantage of multiple dispatch, so when you use a design that breaks inferrability, you lose the advantage to such an extent that it can be worse than less "fancy" methods.

The bottom line: for the demo you were trying, you'll be much better off if you stick to Dict. There are cases where Val-based dispatch is useful: generally, when a single runtime dispatch sets you up for a whole sequence of subsequently inferrable calls, that can be a win. But you should use it judiciously, and (as you have done) always profile your results.