I'm trying to find a formal way to think about the space complexity in haskell. I have found this article about the Graph Reduction (GR) technique which seems to me as a way to go. But I'm having problems applying it in some cases. Consider the following example:
Say we have a binary tree:
data Tree = Node [Tree] | Leaf [Int]
makeTree :: Int -> Tree
makeTree 0 = Leaf [0..99]
makeTree n = Node [ makeTree (n - 1)
, makeTree (n - 1) ]
and two functions that traverse the tree, one (count1) which streams nicely and the the other one (count2) that creates the whole tree in memory at once; according to the profiler.
count1 :: Tree -> Int
count1 (Node xs) = 1 + sum (map count1 xs)
count1 (Leaf xs) = length xs
-- The r parameter should point to the root node to act as a retainer.
count2 :: Tree -> Tree -> Int
count2 r (Node xs) = 1 + sum (map (count2 r) xs)
count2 r (Leaf xs) = length xs
I think I understand how it works in the case of count1, here is what I think happens in terms of graph reduction:
count1 $ makeTree 2
=> 1 + sum $ map count1 xs
=> 1 + sum $ count1 x1 : map count1 xs
=> 1 + count1 x1 + (sum $ map count1 xs)
=> 1 + (1 + sum $ map count1 x1) + (sum $ map count1 xs)
=> 1 + (1 + sum $ (count1 x11) : map count1 x1) + (sum $ map count1 xs)
=> 1 + (1 + count1 x11 + sum $ map count1 x1) + (sum $ map count1 xs)
=> 1 + (1 + count1 x11 + sum $ map count1 x1) + (sum $ map count1 xs)
=> 1 + (1 + 100 + sum $ map count1 x1) + (sum $ map count1 xs)
=> 1 + (1 + 100 + count x12) + (sum $ map count1 xs)
=> 1 + (1 + 100 + 100) + (sum $ map count1 xs)
=> 202 + (sum $ map count1 xs)
=> ...
I think it's clear from the sequence that it runs in constant space, but what changes in case of the count2?
I understand smart pointers in other languages so I vaguely understand that the extra r parameter in count2 function somehow keeps nodes of the tree from beeing destroyed, but I would like to know the exact mechanism, or at least a formal one which I could use in other cases as well.
Thanks for looking.
You could use Adam Bakewell's space semantics,
Haskell currently lacks a standard operational semantics. We argue that such a semantics should be provided to enable reasoning about operational properties of programs, to ensure that implementations guarantee certain space and time behaviour and to help determine the source of space faults. We present a small-step deterministic semantics for the sequential evaluation of Core Haskell programs and show that it is an accurate model of asymptotic space and time usage. The semantics is a formalisation of a graphical notation so it provides a useful mental model as well as a precise mathematical notation. We discuss its implications for education, programming and implementation. The basic semantics is extended with a monadic IO mechanism so that all the space under the control of an implementation is included.
Or work from the Coq specification of the STG machine.