performancehaskellbenchmarkinghaskell-criterion

Global / local environment affects Haskell's Criterion benchmarks results


We're benchmarking some Haskell code in our company and we've just hit a very strange case. Here is a code, which benchmarks the same thing 2 times. The former one uses an Criterion.env which is created for all the tests once, the later creates env for every test. This is the only difference, however the one which creates env for each bench, runs 5 times faster.

Does anyone know what can cause it? Minimal example:

module Main where

import Prelude
import Control.Monad
import qualified Data.Vector.Storable.Mutable as Vector
import qualified Data.Vector.Storable         as Vector
import           Data.Vector.Storable         (Vector)
import           Criterion.Main


testf :: Int -> Vector Int -> IO (Vector Int)
testf !i !v = do
    mv <- Vector.unsafeThaw v
    let go j = do
          x <- if j == 0 then return 0 else Vector.unsafeRead mv (j - 1)
          Vector.unsafeWrite mv j (x+1)
          when (j < i - 1) $ go (j + 1)
    go 0
    Vector.unsafeFreeze mv

mkVec :: Int -> IO (Vector Int)
mkVec !i = Vector.unsafeFreeze =<< Vector.new (i + 1)

main :: IO ()
main = do
    defaultMain
        [ bgroup "env per all runs"
            $ (\(i :: Int) -> env (mkVec (10 ^ i))
            $ \v -> bench ("10e" ++ show i)
            $ nfIO (testf (10 ^ i) v))  <$> [7..8]

        , bgroup "env per every run"
            $ (\(i :: Int) -> bench ("10e" ++ show i)
            $ perRunEnv (mkVec (10 ^ i))
            $ (testf (10 ^ i)))  <$> [7..8]
        ]

And results:

benchmarking env per all runs/10e7
time                 17.34 ms   (17.20 ms .. 17.41 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 17.43 ms   (17.34 ms .. 17.67 ms)
std dev              321.5 μs   (142.1 μs .. 591.3 μs)

benchmarking env per all runs/10e8
time                 173.5 ms   (173.2 ms .. 173.8 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 173.8 ms   (173.6 ms .. 174.0 ms)
std dev              279.5 μs   (194.9 μs .. 355.6 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking env per every run/10e7
time                 4.289 ms   (1.807 ms .. 5.771 ms)
                     0.924 R²   (0.696 R² .. 1.000 R²)
mean                 8.903 ms   (5.752 ms .. 14.20 ms)
std dev              5.029 ms   (249.0 μs .. 6.244 ms)
variance introduced by outliers: 79% (severely inflated)

benchmarking env per every run/10e8
time                 53.76 ms   (30.23 ms .. 98.51 ms)
                     0.940 R²   (0.920 R² .. 1.000 R²)
mean                 102.9 ms   (68.67 ms .. 127.1 ms)
std dev              36.55 ms   (0.0 s .. 41.99 ms)
variance introduced by outliers: 73% (severely inflated)

Solution

  • As with your presumably-coworker's question, I can not reproduce this issue. Note I am using GHC 8.2.2 and criterion 1.3.0.0

    benchmarking env per all runs/10e7
    time                 18.94 ms   (18.71 ms .. 19.22 ms)
                         0.999 R²   (0.998 R² .. 1.000 R²)
    mean                 19.59 ms   (19.39 ms .. 19.99 ms)
    std dev              618.3 μs   (379.7 μs .. 952.8 μs)
    
    benchmarking env per all runs/10e8
    time                 192.0 ms   (189.5 ms .. 194.9 ms)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 191.8 ms   (190.3 ms .. 193.1 ms)
    std dev              1.778 ms   (1.088 ms .. 2.457 ms)
    variance introduced by outliers: 14% (moderately inflated)
    
    benchmarking env per every run/10e7
    time                 18.97 ms   (18.38 ms .. 19.62 ms)
                         0.999 R²   (0.996 R² .. 1.000 R²)
    mean                 18.98 ms   (18.83 ms .. 19.35 ms)
    std dev              298.8 μs   (25.39 μs .. 391.3 μs)
    variance introduced by outliers: 14% (moderately inflated)
    
    benchmarking env per every run/10e8
    time                 194.0 ms   (182.0 ms .. 211.6 ms)
                         0.999 R²   (0.997 R² .. 1.000 R²)
    mean                 192.0 ms   (189.0 ms .. 193.9 ms)
    std dev              2.850 ms   (0.0 s .. 3.261 ms)
    variance introduced by outliers: 19% (moderately inflated)
    
    ./p  18.32s user 1.16s system 99% cpu 19.531 total
    % ghc-pkg list criterion
    ...
        criterion-1.3.0.0
    

    I notice my version of criterion was uploaded after the bug @jberryman pointed to was fixed, so perhaps this is the difference.