haskellghclazy-evaluationaccumulatorspace-leak

Irrefutable pattern does not leak memory in recursion, but why?


The mapAndSum function in the code block all the way below combines map and sum (never mind that another sum is applied in the main function, it just serves to make the output compact). The map is computed lazily, while the sum is computed using an accumulating parameter. The idea is that the result of map can be consumed without ever having the complete list in memory, and (only) afterwards the sum is available "for free". The main function indicates that we had a problem with irrefutable patterns when calling mapAndSum. Let me explain this problem.

According to the Haskell standard, the irrefutable pattern example let (xs, s) = mapAndSum ... in print xs >> print s is translated into

(\ v ->    print (case v of { (xs, s) -> xs })
        >> print (case v of { (xs, s) -> s }))
$ mapAndSum ...

And hence, both print calls carry a reference to the whole pair, which leads to the whole list being kept in memory.

We (my colleague Toni Dietze and I) solved this using an explicit case statement (compare "bad" vs "good2"). BTW, finding this out took us a considerable amount of time..!

Now, what we do not understand is two-fold:

One remark concerning our actual application: we want to achieve stream processing (format conversion) of a large text file while also accumulating some value that is then written to a separate file. As indicated, we were successful, but the two questions remain, and answers may improve our understanding of GHC for upcoming tasks and problems.

Thank you!

{-# LANGUAGE BangPatterns #-}

-- Tested with The Glorious Glasgow Haskell Compilation System, version 7.4.2.

module Main where


import Control.Arrow (first)
import Data.List (foldl')
import System.Environment (getArgs)


mapAndSum :: Num a => (a -> b) -> [a] -> ([b], a)
mapAndSum f = go 0
  where go !s (x : xs) = let (xs', s') = go (s + x) xs in (f x : xs', s')
                       -- ^ I have no idea why it works here. (TD)
        go !s []       = ([], s)


main :: IO ()
main = do
  let foo  = mapAndSum (^ (2 :: Integer)) [1 .. 1000000 :: Integer]
  let sum' = foldl' (+) 0
  args <- getArgs
  case args of
    ["bad" ]  -> let (xs, s) = foo in print (sum xs) >> print s
    ["bad?"]  -> print $ first sum' $ foo
              -- ^ Without ghc's optimizer, this version is as memory
              -- efficient as the “good” versions
              -- With optimization “bad?” is as bad as “bad”. Why? (TD)
    ["good1"] -> print $ first' sum' $ foo
                   where first' g (x, y) = (g x, y)
    ["good2"] -> case foo of
                    (xs, s) -> print (sum' xs) >> print s
    ["good3"] -> (\ (xs, s) -> print (sum' xs) >> print s) $ foo
    _ -> error "Sorry, I do not understand."

Solution

  • Let me first answer why mapAndSome can work good at all: What you see is (very likely) the effect of an optimization described by Philip Wadler in “Fixing some space leaks with a garbage collector”. Short summary: If the garbage collector sees a thunk of the form fst x and x is already evaluated to the tuple constructor, e.g. (y,z), it will replace fst x by y, possibly freeing up z if it is not referenced anywhere else.

    In your code, the s' will, once the result of go is evaluated to a tuple and after one round of GCing, not keep a reference to the tuple but will be replaced by the accumulated parameter.

    Now let’s look a the other patterns by investigating core. The foo binding is compiled to:

    foo_r2eT :: ([Type.Integer], Type.Integer)
    
    foo_r2eT =
      case $wgo_r2eP mapAndSum1 lvl2_r2eS
      of _ { (# ww1_s2d7, ww2_s2d8 #) ->
      (ww1_s2d7, ww2_s2d8)
      }
    

    Here is the code in the "bad" case (lvl18_r2fd is "bad"):

    case eqString ds_dyA lvl18_r2fd of _ {
      False -> $wa_s2da new_s_a14o;
      True ->
        case ds1_dyB of _ {
          [] ->
            case Handle.Text.hPutStr2
                   Handle.FD.stdout lvl17_r2fc True new_s_a14o
            of _ { (# new_s1_X15h, _ #) ->
            Handle.Text.hPutStr2
              Handle.FD.stdout lvl16_r2fb True new_s1_X15h
            };
          : ipv_sIs ipv1_sIt -> $wa_s2da new_s_a14o
        }
    

    We can see that two values at the module level are printed, lvl17_r2fc and lvl16_r2fb, here is their code:

    lvl17_r2fc :: String
    [GblId]
    lvl17_r2fc =
      case foo_r2eT of _ { (xs_Xqp, s_Xq9) ->
      $w$cshowsPrec
        0
        (Data.List.sum_sum' xs_Xqp Data.List.genericDrop2)
        ([] @ Char)
      }
    
    lvl16_r2fb :: String
    [GblId]
    lvl16_r2fb =
      case foo_r2eT of _ { (xs_apS, s_Xqp) ->
      $w$cshowsPrec 0 s_Xqp ([] @ Char)
      }
    

    Why are they bound at the module level, and not inside the expression? This is an effect of lazy lifting, another optimization that increases sharing and hence sometime has an adverse effect on space performance. See GHC ticket 719 for another occurrence of this effect.

    So what happens is that the evaluation of lvl17_r2fc causes foo to be evaluated, and the left entry lazily printed. Unfortunately, lvl16_r2fb is still live and retains the full tuple. And because the garbage collector (seems) to fail to see that this is an selector thunk, Wadler’s optimization does not kick in.

    In contrast, here is the code for "good1" a.k.a. lvl8_r2f1:

                case eqString ds_dyA lvl8_r2f1 of _ {
                  False -> $wa2_s2dI w3_s2cF;
                  True ->
                    case ds1_dyB of _ {
                      [] ->
                        Handle.Text.hPutStr2
                          Handle.FD.stdout lvl7_r2f0 True w3_s2cF;
                      : ipv_sHg ipv1_sHh -> $wa2_s2dI w3_s2cF
                    }
                } } in
    

    where the printed value is this string:

    lvl7_r2f0 :: String
    [GblId]
    lvl7_r2f0 =
      case foo_r2eT of _ { (x_af6, y_af7) ->
      show_tuple
        (:
           @ ShowS
           (let {
              w2_a2bY [Dmd=Just L] :: Type.Integer
    
              w2_a2bY = lgo_r2eU mapAndSum1 x_af6 } in
            \ (w3_a2bZ :: String) ->
              $w$cshowsPrec 0 w2_a2bY w3_a2bZ)
           (:
              @ ShowS
              (\ (w2_a2bZ :: String) ->
                 $w$cshowsPrec 0 y_af7 w2_a2bZ)
              ([] @ ShowS)))
        ([] @ Char)
      }
    

    As you can see the tuple is taken apart only once, and both values are used. So nothing refers to the tuple as a whole and it can be garbage collected. Similar for "good2" and "good3".

    Now to "bad?": In the unoptimized case, we get this code:

     case eqString ds_dyA (unpackCString# "bad?")
                     of _ {
                       False -> fail2_dyN realWorld#;
                       True ->
                         case ds1_dyB of _ {
                           [] ->
                             $
                               @ (Type.Integer, Type.Integer)
                               @ (IO ())
                               (System.IO.print
                                  @ (Type.Integer, Type.Integer) $dShow_rzk)
                               ($
                                  @ ([Type.Integer], Type.Integer)
                                  @ (Type.Integer, Type.Integer)
                                  (Control.Arrow.first
                                     @ (->)
                                     Control.Arrow.$fArrow(->)
                                     @ [Type.Integer]
                                     @ Type.Integer
                                     @ Type.Integer
                                     sum'_rzm)
                                  foo_rzl);
                           : ipv_szd ipv1_sze -> fail2_dyN realWorld#
                         }
                     } } in
    

    The implementation of first via *** uses refutable patterns, so the kind of selector thunks that the garbage collector handles well are generated.

    In the optimized case, things are a bit scattered, but anyways here is the relevant code (the last value is the one that is being printed):

    w_r2f2 :: Type.Integer
    
    w_r2f2 =
      case foo_r2eT of _ { (x_aI1, y_aI2) ->
      lgo_r2eU mapAndSum1 x_aI1
      }
    
    lvl9_r2f3 :: String -> String
    [GblId, Arity=1]
    lvl9_r2f3 =
      \ (w2_a2bZ :: String) ->
        $w$cshowsPrec 0 w_r2f2 w2_a2bZ
    
    w1_r2f4 :: Type.Integer
    
    w1_r2f4 = case foo_r2eT of _ { (x_aI6, y_aI7) -> y_aI7 }
    
    lvl10_r2f5 :: String -> String
    [GblId, Arity=1]
    lvl10_r2f5 =
      \ (w2_a2bZ :: String) ->
        $w$cshowsPrec 0 w1_r2f4 w2_a2bZ
    
    lvl11_r2f6 :: [ShowS]
    [GblId]
    lvl11_r2f6 =
      :
        @ ShowS lvl10_r2f5 ([] @ ShowS)
    
    lvl12_r2f7 :: [ShowS]
    [GblId]
    lvl12_r2f7 = : @ ShowS lvl9_r2f3 lvl11_r2f6
    
    lvl13_r2f8 :: ShowS
    [GblId]
    lvl13_r2f8 = show_tuple lvl12_r2f7
    
    lvl14_r2f9 :: String
    [GblId]
    lvl14_r2f9 = lvl13_r2f8 ([] @ Char)
    

    The use of first has been inlined. We see two calls to case foo_r2eT, so this is prone for a space leak, despite w1_rf24 looking like a selector (so I’d expect the runtime to apply Wadler’s optimization). Maybe it does not work well for static thunks? Indeed the commentary, if up to date, only talks about dynamic allocated selector thunks. So if your foo were not a module-level-value (or rather lazy-liftable into one) but rather dependent on some input, w1_rf24 might be dynamically allocated and hence eligible for the special treatment. But then the code might look very different anyways.