I wonder why operating on Float64
values is faster than operating on Float16
:
julia> rnd64 = rand(Float64, 1000);
julia> rnd16 = rand(Float16, 1000);
julia> @benchmark rnd64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.800 μs … 662.140 μs ┊ GC (min … max): 0.00% … 99.37%
Time (median): 2.180 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.457 μs ± 13.176 μs ┊ GC (mean ± σ): 12.34% ± 3.89%
▁██▄▂▂▆▆▄▂▁ ▂▆▄▁ ▂▂▂▁ ▂
████████████████▇▇▆▆▇▆▅▇██▆▆▅▅▆▄▄▁▁▃▃▁▁▄▁▃▄▁▃▁▄▃▁▁▆▇██████▇ █
1.8 μs Histogram: log(frequency) by time 10.6 μs <
Memory estimate: 8.02 KiB, allocs estimate: 5.
julia> @benchmark rnd16.^2
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
Range (min … max): 5.117 μs … 587.133 μs ┊ GC (min … max): 0.00% … 98.61%
Time (median): 5.383 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.716 μs ± 9.987 μs ┊ GC (mean ± σ): 3.01% ± 1.71%
▃▅█▇▅▄▄▆▇▅▄▁ ▁ ▂
▄██████████████▇▆▇▆▆▇▆▇▅█▇████▇█▇▇▆▅▆▄▇▇▆█▇██▇█▇▇▇▆▇▇▆▆▆▆▄▄ █
5.12 μs Histogram: log(frequency) by time 7.48 μs <
Memory estimate: 2.14 KiB, allocs estimate: 5.
Maybe you ask why I expect the opposite: Because Float16
values have less floating point precision:
julia> rnd16[1]
Float16(0.627)
julia> rnd64[1]
0.4375452455597999
Shouldn't calculations with fewer precisions take place faster? Then I wonder why someone should use Float16
? They can do it even with Float128
!
As you can see, the effect you are expecting is present for Float32
:
julia> rnd64 = rand(Float64, 1000);
julia> rnd32 = rand(Float32, 1000);
julia> rnd16 = rand(Float16, 1000);
julia> @btime $rnd64.^2;
616.495 ns (1 allocation: 7.94 KiB)
julia> @btime $rnd32.^2;
330.769 ns (1 allocation: 4.06 KiB) # faster!!
julia> @btime $rnd16.^2;
2.067 μs (1 allocation: 2.06 KiB) # slower!!
Float64
and Float32
have hardware support on most platforms, but Float16
does not, and must therefore be implemented in software.
Note also that you should use variable interpolation ($
) when micro-benchmarking. The difference is significant here, not least in terms of allocations:
julia> @btime $rnd32.^2;
336.187 ns (1 allocation: 4.06 KiB)
julia> @btime rnd32.^2;
930.000 ns (5 allocations: 4.14 KiB)