Benchmarking Filter and Partition

I was testing the performance of the partition function for lists and got some strange results, I think.

We have that partition p xs == (filter p xs, filter (not . p) xs) but we chose the first implementation because it only performs a single traversal over the list. Yet, the results I got say that it maybe be better to use the implementation that uses two traversals.

Here is the minimal code that shows what I'm seeing

import Criterion.Main
import System.Random
import Data.List (partition)

mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)



randList :: RandomGen g => g -> Integer -> [Integer]
randList gen 0 = []
randList gen n = x:xs
  where
    (x, gen') = random gen
    xs = randList gen' (n - 1)

main = do
  gen <- getStdGen
  let arg10000000 = randList gen 10000000
  defaultMain [
      bgroup "filters -- split list in half " [
        bench "partition100"         $ nf (partition (>= 50)) arg10000000
      , bench "mypartition100"       $ nf (mypartition (>= 50)) arg10000000
      ]
      ]

I ran the tests both with -O and without it and both times I get that the double traversals is better.

I am using ghc-7.10.3 with criterion-1.1.1.0

My questions are:

Is this expected?
Am I using Criterion correctly? I know that laziness can be tricky and (filter p xs, filter (not . p) xs) will only do two traversals if both elements of the tuple are used.
Does this has to do something with the way lists are handled in Haskell?

Thanks a lot!

Solution

There is no black or white answer to the question. To dissect the problem consider the following code:

import Control.DeepSeq
import Data.List (partition)
import System.Environment (getArgs)


mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)


main :: IO ()
main = do
  let cnt = 10000000
      xs = take cnt $ concat $ repeat [1 .. 100 :: Int]
  args <- getArgs
  putStrLn $ unwords $ "Args:" : args
  case args of
    [percent, fun]
      -> let p = (read percent >=)
         in case fun of
           "partition"      ->              print $ rnf $ partition   p xs
           "mypartition"    ->              print $ rnf $ mypartition p xs
           "partition-ds"   -> deepseq xs $ print $ rnf $ partition   p xs
           "mypartition-ds" -> deepseq xs $ print $ rnf $ mypartition p xs
           _ -> err
    _ -> err
  where
    err = putStrLn "Sorry, I do not understand."

I do not use Criterion to have a better control about the order of evaluation. To get timings, I use the +RTS -s runtime option. The different test case are executed using different command line options. The first command line option defines for which percentage of the data the predicate holds. The second command line option chooses between different tests.

The tests distinguish two cases:

The data is generated lazily (2nd argument partition or mypartition).
The data is already fully evaluated in memory (2nd argument partition-ds or mypartition-ds).

The result of the partitioning is always evaluated from left to right, i.e. starting with the list that contains all the elements for which the predicate holds.

In case 1 partition has the advantage that elements of the first resulting list get discarded before all elements of the input list were even produced. Case 1 is especially good, if the predicate matches many elements, i.e. the first command line argument is large.

In case 2, partition cannot play out this advantage, since all elements are already in memory.

For mypartition, in any case all elements are held in memory after the first resulting list is evaluated, because they are needed again to compute the second resulting list. Therefore there is not much of a difference between the two cases.

It seems, the more memory is used, the harder garbage collection gets. Therefore partition is well suited, if the predicate matches many elements and the lazy variant is used.

Conversely, if the predicate does not match many elements or all elements are already in memory, mypartition performs better, since its recursion does not deal with pairs in contrast to partition.

The Stackoverflow question “Irrefutable pattern does not leak memory in recursion, but why?” might give some more insights about the handling of pairs in the recursion of partition.