performance haskell benchmarking strictness

Haskell computation performance

I have several different implementations of the same function. The difference lies in the use of bang patterns. The question is, why does perimeterNaiveFast work the same as perimeterStrictFast?

Benchmark results:

Function realisations:

> data Point = Point { x :: Int, y :: Int }
> 
> distance :: Point -> Point -> Double  
> distance (Point x1 y1) (Point x2 y2) =    
>   sqrt $ fromIntegral $ (x1 - x2) ^ (2 :: Int) + (y1 - y2) ^ (2 :: Int)
> 
> perimeterNaive :: [Point] -> Double 
> perimeterNaive [] = 0.0
> perimeterNaive points = foldPoints points 0.0   
>   where
>     firstPoint = head points
>     foldPoints []              _   =  0.0
>     foldPoints [lst]           acc = acc + distance firstPoint lst
>     foldPoints (prev:next:rst) acc = foldPoints (next:rst) (acc + distance prev next)
> 
> perimeterStrict :: [Point] -> Double perimeterStrict [] = 0.0
> perimeterStrict points = foldPoints points 0.0   
>   where
>     firstPoint = head points
>     foldPoints []              _    =  0.0
>     foldPoints [lst]           acc  = acc + distance firstPoint lst
>     foldPoints (prev:next:rst) !acc = foldPoints (next:rst) (acc + distance prev next)
> 
> perimeterNaiveFast :: [Point] -> Double perimeterNaiveFast [] = 0.0
> perimeterNaiveFast (first:rest) = foldPoints rest first 0.0   
>   where
>     foldPoints []         lst  acc = acc + distance first lst
>     foldPoints (next:rst) prev acc = foldPoints rst next (acc + distance next prev)
> 
> perimeterStrictFast :: [Point] -> Double perimeterStrictFast [] = 0.0
> perimeterStrictFast (first:rest) = foldPoints rest first 0.0   
>   where
>     foldPoints []         lst  acc = acc + distance first lst
>     foldPoints (next:rst) prev !acc = foldPoints rst next (acc + distance next prev)
> 
> main :: IO () 
> main = defaultMain [ perimeterBench $ 10 ^ (6 :: Int) ]
> 
> perimeterBench :: Int -> Benchmark perimeterBench n = bgroup "Perimiter"   
>   [ bench "Naive"  $ nf perimeterNaive points   
>   , bench "Strict" $ nf perimeterStrict points 
>   , bench "Naive fast" $ nf perimeterNaiveFast points 
>   , bench "Strict fast" $ nf perimeterStrictFast points   
>   ]   
>     where
>       points = map (\i -> Point i i) [1..n]

Solution

GHC's strictness analysis pass has noticed that the Float accumulator is not (and cannot be) lazily consumed, and converted your naive version into your strict version on your behalf.

The reason it has not also done this for your other naive/fast pair is this clause:

foldPoints []              _   =  0.0

Here you ignore the accumulator, and so the compiler believes that maybe in some cases not forcing the computation as you go along may be better. Changing this to

foldPoints []            acc   =  acc

is enough for GHC to make your other naive/strict pair have the same performance on my machine.