haskellmemorymemory-leaksgraph-reduction

Graph reduction / Tricky space leak example


I'm wondering about an example for a space leak I read about on this page (sadly it was not explained there): https://en.wikibooks.org/wiki/Haskell/Graph_reduction

Tricky space leak example:

(\xs -> head xs + last xs) [1..n]

(\xs -> last xs + head xs) [1..n]

The first version runs on O(1) space. The second in O(n).

I'm not sure, whether I understand correctly (hope you can help). As far as I know, Lazy Evaluation means leftmost outermost evaluation. So, in those examples we would reduce those redexes something like this:

(\xs -> head xs + last xs) [1..200]

=> ([1..200] -> head xs + last xs)

=> head [1..200] + last [1..200]

=> 1 + last [1..200]

=> 1 + last [2..200]

=> ... ...

=> 1 + last [199..200]

=> 1 + last [200]

=> 1 + 200

=> 201

And

(\xs -> last xs + head xs) [1..200]

([1..200] -> last xs + head xs)

last [1..200] + head [1..200]

last [2..200] + head [1..200]

... ...

last [199..200] + head [1..200]

last [200] + head [1..200]

200 + head [1..200]

200 + 1

201

Maybe I reduced it the wrong way (please correct me, if I'm wrong), but here I can't see a possible space leak. So, I tested the runtime (not space) first with ghci:

1>> (\xs -> head xs + last xs) [1..50000000]
50000001
(10.05 secs, 4,003,086,632 bytes)
2>> (\xs -> last xs + head xs) [1..50000000]
50000001
(2.26 secs, 3,927,748,176 bytes)

According to wikibooks there should be a space leak in the second version, but the runtime is much faster (it might be possible, nothing strange here).

I have the following source code:

module Main where

main = do
--  let a = (\xs -> head xs + last xs) [1..20000000]    -- space leak
let b = (\xs -> last xs + head xs) [1..20000000]    -- no space leak
--  putStrLn ("result for a - (\\xs -> head xs + last xs): " ++ show a)
putStrLn ("result for b - (\\xs -> last xs + head xs): " ++ show b)

... and I compiled it with no optimization, then I called the program:

$ ghc -O0 --make -rtsopts -fforce-recomp Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...

$ ./Main +RTS -sstderr
result for b - (\xs -> last xs + head xs): 20000001
   1,600,057,352 bytes allocated in the heap
         211,000 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3062 colls,     0 par    0.012s   0.012s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0002s    0.0002s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.518s  (  0.519s elapsed)
  GC      time    0.012s  (  0.012s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.534s  (  0.532s elapsed)

  %GC     time       2.3%  (2.3% elapsed)

  Alloc rate    3,088,101,743 bytes per MUT second

  Productivity  97.6% of total user, 98.0% of total elapsed

That's a nice result, we have 2.3% garbage collection and the used memory is approximately 1 MB. Then I compiled the other case with no optimization and got the following result:

module Main where

main = do
  let a = (\xs -> head xs + last xs) [1..20000000]    -- space leak
--  let b = (\xs -> last xs + head xs) [1..20000000]    -- no space leak
  putStrLn ("result for a - (\\xs -> head xs + last xs): " ++ show a)
--  putStrLn ("result for b - (\\xs -> last xs + head xs): " ++ show b)

$ ghc -O0 --make -rtsopts -fforce-recomp Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...

$ ./Main +RTS -sstderr
result for a - (\xs -> head xs + last xs): 20000001
   1,600,057,352 bytes allocated in the heap
   2,088,615,552 bytes copied during GC
     540,017,504 bytes maximum residency (13 sample(s))
     135,620,768 bytes maximum slop
            1225 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3051 colls,     0 par    0.911s   0.915s     0.0003s    0.0016s
  Gen  1        13 colls,     0 par    2.357s   2.375s     0.1827s    0.9545s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.434s  (  0.430s elapsed)
  GC      time    3.268s  (  3.290s elapsed)
  EXIT    time    0.094s  (  0.099s elapsed)
  Total   time    3.799s  (  3.820s elapsed)

  %GC     time      86.0%  (86.1% elapsed)

  Alloc rate    3,687,222,801 bytes per MUT second

  Productivity  14.0% of total user, 13.9% of total elapsed

This is much worse, there's a lot of garbage collection going on and the memory usage is much higher.

What's actually happening here? I can't see why there is a space leak.


P.S. If you compile both cases with full optimization (O2), then both programs are almost equally efficient.


Solution

  • xs is the same. when you write out [1..200] twice, you're confusing yourself. better to explicitly name all interim entities as they come into existence in the process of an expression's evaluation:

    (\xs -> head xs + last xs) [1,2,3]
    head xs + last xs        { xs = [1,2,3] }
    (+) (head xs) (last xs)
    (+) 1         (last xs)  { xs = 1:xs1 ; xs1 = [2,3] }
                  (last xs1) { xs1 = 2:xs2 ; xs2 = [3] }
                  (last xs2)
                  3
    4
    

    Here we see that the bindings for xs (and xs1) can be safely forgotten as we move along, because there's no more references to xs anywhere.

    So you did reduce it correctly, except that the second [1..200] is the same as the first, so we had to hold onto it while finding the last of it first (in the second variant), hence the leak.

    Of course a compiler is permitted to optimize this, substituting equals for equals because of referential transparency principle, and perform the enumeration [1..200] twice, thus running in O(1) space for the second variant too.

    So in the end, it is a compiler thing. The space leak might happen (not should).