I have two source files which are doing roughly the same. The only difference is that in the first case function is passed as a parameter and in the second one - value.
First case:
module Main where
import Data.Vector.Unboxed as UB
import qualified Data.Vector as V
import Criterion.Main
regularVectorGenerator :: (Int -> t) -> V.Vector t
regularVectorGenerator = V.generate 99999
unboxedVectorGenerator :: Unbox t => (Int -> t) -> UB.Vector t
unboxedVectorGenerator = UB.generate 99999
main :: IO ()
main = defaultMain
[
bench "boxed" $ whnf regularVectorGenerator (+2137)
, bench "unboxed" $ whnf unboxedVectorGenerator (+2137)
]
Second case:
module Main where
import Data.Vector.Unboxed as UB
import qualified Data.Vector as V
import Criterion.Main
regularVectorGenerator :: Int -> V.Vector Int
regularVectorGenerator = flip V.generate (+2137)
unboxedVectorGenerator :: Int -> UB.Vector Int
unboxedVectorGenerator = flip UB.generate (+2137)
main :: IO ()
main = defaultMain
[
bench "boxed" $ whnf regularVectorGenerator 99999
, bench "unboxed" $ whnf unboxedVectorGenerator 99999
]
What I noticed that during benchamrking size of vector the unboxed is, as expected, always smaller yet size of both vectors vary drasticlly. Here is output of
first case:
benchmarking boxed
time 7.626 ms (7.515 ms .. 7.738 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 7.532 ms (7.472 ms .. 7.583 ms)
std dev 164.3 μs (133.8 μs .. 201.3 μs)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **1.680e7** (1.680e7 .. 1.680e7)
y 2357.390 (1556.690 .. 3422.724)
benchmarking unboxed
time 889.1 μs (878.9 μs .. 901.8 μs)
0.998 R² (0.995 R² .. 0.999 R²)
mean 868.6 μs (858.6 μs .. 882.6 μs)
std dev 39.05 μs (28.30 μs .. 57.02 μs)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **4000009.003** (4000003.843 .. 4000014.143)
y 2507.089 (2025.196 .. 3035.962)
variance introduced by outliers: 36% (moderately inflated)
and the second case:
benchmarking boxed
time 1.366 ms (1.357 ms .. 1.379 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 1.350 ms (1.343 ms .. 1.361 ms)
std dev 29.96 μs (21.74 μs .. 43.56 μs)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **2400818.350** (2400810.284 .. 2400826.685)
y 2423.216 (1910.901 .. 3008.024)
variance introduced by outliers: 12% (moderately inflated)
benchmarking unboxed
time 61.30 μs (61.24 μs .. 61.37 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 61.29 μs (61.25 μs .. 61.33 μs)
std dev 122.1 ns (91.64 ns .. 173.9 ns)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **800040.029** (800039.745 .. 800040.354)
y 2553.830 (2264.684 .. 2865.637)
Benchameked size of vector decreased by order of magnitude just by de-parametrizing function. Can someone explain me why?
I compiled both exaples with those flags:
-O2 -rtsopts
and launched with
--regress allocated:iters +RTS -T
The difference is that if the generating function is already known in the benchmarked function, the generator is inlined and the involved Int
-s are unboxed as well. If the generating function is the benchmark parameter, it cannot be inlined.
From the benchmarking perspective the second version is the correct one, since in normal usage we want the generating function to be inlined.