recursionfunctional-programmingbinary-search-treesmldifference-lists

Fast serialization of BST using a difference list


Background

I'm working through Ullmans Elements of ML programming in my spare-time. End goal is to self-study Andrew Appels Modern Compiler Implementation in ML.

In Elements of ML, Ullman describes the difference list:

There is a trick known to LISP programmers as difference lists, in which one manipulates lists more efficiently by keeping, as an extra parameter of your function, a list that represents in some way what you have already accomplished. The idea comes up in a number of different applications;

Ullman uses reverse as an example of the difference list technique. Here is a slow function that runs in O(n^2).

fun reverse nil = nil
  | reverse (x::xs) = reverse(xs) @ [x]

And the faster one using a difference list

fun rev1(nil, M) = M
  | rev1(x::xs, ys) = rev1(xs, x::ys)

fun reverse L = rev1(L, nil)

My problem

I have this Binary Search Tree (BST) data type.

datatype 'a btree = Empty
      | Node of 'a * 'a btree * 'a btree

A naive solution for collecting a list of the elements in pre-order would be

fun preOrder Empty = nil
  | preOrder (Node(x, left, right)) = [x] @ preOrder left @ preOrder right

But Ullman points out that the @ operator is slow and suggests in exercise 6.3.5 that I implement preOrder using a difference list.

After some head scratching I came up with this function:

fun preOrder tree = let
    fun pre (Empty, L)  = L
      | pre (Node(x, left, right), L) = let
          val L = pre(right, L)
          val L = pre(left, L)
        in
            x::L
        end
    in
       pre (tree, nil)
end

It outputs the elements in pre-order. BUT it evaluates the tree in post-order! And the code is uglier than the naive preOrder one.

> val t = Node(5, 
    Node(3, 
       Node(1, Empty, Empty), 
       Node(4, Empty, Empty)), 
    Node(9, Empty, Empty))
> preOrder t
val it = [5,3,1,4,9] : int list

Prior Art

I tried searching for references to difference lists in ML programming, and found John Hughes original article describing how to use difference lists for reverse.

I also found Matthew Brecknells difference list blog post with examples in Haskell. He makes a distinction between using an accumulator, like Ullmans reverse example and creating a new type for difference lists. He also presents a tree flattener. But I have a hard time understanding the Haskell code and would appreciate a similar expose but in Standard ML. abc


Question


Solution

  • Your eventual method of doing this is is the best it reasonably gets. The nice way to do this turns out to be

    fun preOrderHelper (Empty, lst) = lst
      | preOrderHelper (Node(x, left, right), lst) = 
            x :: preOrderHelper(left, preOrderHelper(right, lst))
    
    fun preOrder tree = preOrderHelper(tree, Nil)
    

    Note that the run time of preOrderHelper(tree, list) is only a function of tree. Call r(t) the run time of preOrderHelper on tree t. Then we have r(Empty) = O(1) and r(Node(x, left, right)) = O(1) + r(left) + r(right), so clearly r(t) is linear in the size of t.

    What is the derivation of this technique? Is there a more principled way of deriving it? In general, when you're turning a data structure into a list, you want to foldr onto an empty list. I don't know enough ML to say what the equivalent of typeclasses is, but in Haskell, we would approach the situation as follows:

    data Tree a = Empty | Node a (Tree a) (Tree a)
    
    instance Foldable Tree where
       foldr f acc t = foldrF t acc  where
          foldrF Empty acc = acc
          foldrF (Node x left right) acc = f x (foldrF left (foldrF right acc))
    

    To convert a Tree a to a [a], we would call Data.Foldable.toList, which is defined in Data.Foldable as

    toList :: Foldable f => f a -> [a]
    toList = foldr (:) []
    

    Unfolding this definition gives us the equivalent of the ML definition above.

    As you can see, your technique is actually a special case of a very principled way to turn data structures into lists.

    In fact, in modern Haskell, we can do this totally automatically.

    {-# LANGUAGE DeriveFoldable #-}
    
    data Tree a = Empty | Node a (Tree a) (Tree a) deriving Foldable
    

    will give us the equivalent(*) of the above Foldable implementation automatically, and we can then immediately use toList. I don't know what the equivalent is in ML, but I'm sure there's something analogous.

    The difference between ML and Haskell is that Haskell is lazy. Haskell's laziness means that the evaluation of preOrder actually walks the tree in the pre-Order order. This is one of the reasons I prefer laziness. Laziness permits very fine-grained control over the order of evaluation without resorting to non-functional techniques.


    (*) (up to the arguments order, which does not count in the lazy Haskell.)