Benchmarking the following:
#!/usr/bin/env stack
-- stack --resolver lts-16.2 script --package async --package criterion
import Control.Concurrent.Async (async, replicateConcurrently_)
import Control.Monad (replicateM_, void)
import Criterion.Main
main :: IO ()
main = defaultMain [
bgroup "tests" [ bench "sync" $ nfIO syncTest
, bench "async" $ nfIO asyncTest
]
]
syncTest :: IO ()
syncTest = replicateM_ 100000 dummy
asyncTest :: IO ()
asyncTest = replicateConcurrently_ 100000 dummy
dummy :: IO Int
dummy = return $ fib 10000000000
fib :: Int -> Int
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
Gives me this:
% ./applicative-v-monad.hs
benchmarking tests/sync
time 2.120 ms (2.075 ms .. 2.160 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 2.040 ms (2.023 ms .. 2.073 ms)
std dev 77.37 μs (54.96 μs .. 122.8 μs)
variance introduced by outliers: 23% (moderately inflated)
benchmarking tests/async
time 475.3 ms (310.7 ms .. 642.8 ms)
0.984 R² (0.943 R² .. 1.000 R²)
mean 527.2 ms (497.9 ms .. 570.9 ms)
std dev 41.30 ms (4.833 ms .. 52.83 ms)
variance introduced by outliers: 21% (moderately inflated)
Where it is apparent that asyncTest runs longer than syncTest.
I would have thought that running expensive actions concurrently will be faster than running them in sequence. Is there some flaw in my reasoning?
There are a few problems with this benchmark.
As @David Fletcher pointed out, you are not forcing computation of fib. The fix for this problem normally would be as easy as:
dummy :: IO Int
dummy = return $! fib 10000000000
Which is enough to make us wait for eternity. Lowering it to something more manageable is the next thing we should do:
dummy :: IO Int
dummy = return $! fib 35
This normally would be enough, however ghc is too smart and it will see that this computation is really pure and will optimize the loop of 100000 iterations into a single computation and return the same result 100000 times, so in reality it will compute this fib only once. Instead lets make fib
depend on the number of iteration:
xs :: [Int]
xs = [1..35]
syncTest :: IO ()
syncTest = mapM_ dummy xs
asyncTest :: IO ()
asyncTest = mapConcurrently_ dummy xs
dummy :: Int -> IO Int
dummy n = return $! fib n
stack script
will run the code iterpreted and without threaded environment. So your code will run slow and sequentially. We fix it with manual compilation and some flags:
$ stack exec --resolver lts-16.2 --package async --package criterion -- ghc -threaded -O2 -rtsopts -with-rtsopts=-N bench-async.hs
$ stack exec --resolver lts-16.2 -- ./bench-async
Of course, for a full blown stack project all these flags go into a cabal file instead and running stack bench
will do the rest.
In the question you have asyncTest = replicateConcurrently_ 100000 dummy
. Unless number of iterations is very low, which it is not, you don't want to use async
for this because spawning at least a 100000 threads is not free, it is better to use a work stealing scheduler for this type of work load. I specifically wrote a library for this purpose: scheduler
Here is an example how to use it:
import qualified Control.Scheduler as S
main :: IO ()
main = defaultMain [
bgroup "tests" [ bench "sync" $ whnfIO syncTest
, bench "async" $ nfIO asyncTest
, bench "scheduler" $ nfIO schedulerTest
]
]
schedulerTest :: IO ()
schedulerTest = S.traverseConcurrently_ S.Par dummy xs
Now this will give us more sensible numbers:
benchmarking tests/sync
time 246.7 ms (210.6 ms .. 269.0 ms)
0.989 R² (0.951 R² .. 1.000 R²)
mean 266.4 ms (256.4 ms .. 286.0 ms)
std dev 21.60 ms (457.3 μs .. 26.92 ms)
variance introduced by outliers: 18% (moderately inflated)
benchmarking tests/async
time 135.4 ms (127.8 ms .. 147.9 ms)
0.992 R² (0.980 R² .. 1.000 R²)
mean 134.8 ms (129.7 ms .. 138.0 ms)
std dev 6.578 ms (3.605 ms .. 9.807 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking tests/scheduler
time 109.0 ms (96.83 ms .. 120.3 ms)
0.989 R² (0.956 R² .. 1.000 R²)
mean 111.5 ms (108.0 ms .. 120.2 ms)
std dev 7.574 ms (2.496 ms .. 11.85 ms)
variance introduced by outliers: 12% (moderately inflated)