performancelisthaskelltime-complexitydifference-lists

Why are difference lists more efficient than regular concatenation in Haskell?


I am currently working my way through the Learn you a Haskell book online, and have come to a chapter where the author is explaining that some list concatenations can be inefficient: For example

((((a ++ b) ++ c) ++ d) ++ e) ++ f

is supposedly inefficient. The solution the author comes up with is to use 'difference lists' defined as

newtype DiffList a = DiffList {getDiffList :: [a] -> [a] }

instance Monoid (DiffList a) where
    mempty = DiffList (\xs -> [] ++ xs)
    (DiffList f) `mappend` (DiffList g) = DiffList (\xs -> f (g xs))

I am struggling to understand why DiffList is more computationally efficient than a simple concatenation in some cases. Could someone explain to me in simple terms why the above example is so inefficient, and in what way the DiffList solves this problem?


Solution

  • The problem in

    ((((a ++ b) ++ c) ++ d) ++ e) ++ f
    

    is the nesting. The applications of (++) are left-nested, and that's bad; right-nesting

    a ++ (b ++ (c ++ (d ++ (e ++f))))
    

    would not be a problem. That is because (++) is defined as

    [] ++ ys = ys
    (x:xs) ++ ys = x : (xs ++ ys)
    

    so to find which equation to use, the implementation must dive into the expression tree

                 (++)
                 /  \
              (++)   f
              /  \
           (++)   e
           /  \
        (++)   d
        /  \
     (++)   c
     /  \
    a    b
    

    until it finds out whether the left operand is empty or not. If it's not empty, its head is taken and bubbled to the top, but the tail of the left operand is left untouched, so when the next element of the concatenation is demanded, the same procedure starts again.

    When the concatenations are right-nested, the left operand of (++) is always at the top, and checking for emptiness/bubbling up the head are O(1).

    But when the concatenations are left-nested, n layers deep, to reach the first element, n nodes of the tree must be traversed, for each element of the result (coming from the first list, n-1 for those coming from the second etc.).

    Let us consider a = "hello" in

    hi = ((((a ++ b) ++ c) ++ d) ++ e) ++ f
    

    and we want to evaluate take 5 hi. So first, it must be checked whether

    (((a ++ b) ++ c) ++ d) ++ e
    

    is empty. For that, it must be checked whether

    ((a ++ b) ++ c) ++ d
    

    is empty. For that, it must be checked whether

    (a ++ b) ++ c
    

    is empty. For that, it must be checked whether

    a ++ b
    

    is empty. For that, it must be checked whether

    a
    

    is empty. Phew. It isn't, so we can bubble up again, assembling

    a ++ b                             = 'h':("ello" ++ b)
    (a ++ b) ++ c                      = 'h':(("ello" ++ b) ++ c)
    ((a ++ b) ++ c) ++ d               = 'h':((("ello" ++ b) ++ c) ++ d)
    (((a ++ b) ++ c) ++ d) ++ e        = 'h':(((("ello" ++ b) ++ c) ++ d) ++ e)
    ((((a ++ b) ++ c) ++ d) ++ e) ++ f = 'h':((((("ello" ++ b) ++ c) ++ d) ++ e) ++ f)
    

    and for the 'e', we must repeat, and for the 'l's too...

    Drawing a part of the tree, the bubbling up goes like this:

                (++)
                /  \
             (++)   c
             /  \
    'h':"ello"   b
    

    becomes first

         (++)
         /  \
       (:)   c
      /   \
    'h'   (++)
          /  \
     "ello"   b
    

    and then

          (:)
          / \
        'h' (++)
            /  \
         (++)   c
         /  \
    "ello"   b
    

    all the way back to the top. The structure of the tree that becomes the right child of the top-level (:) finally, is exactly the same as the structure of the original tree, unless the leftmost list is empty, when the

     (++)
     /  \
    []   b
    

    nodes is collapsed to just b.

    So if you have left-nested concatenations of short lists, the concatenation becomes quadratic because to get the head of the concatenation is an O(nesting-depth) operation. In general, the concatenation of a left-nested

    (...((a_d ++ a_{d-1}) ++ a_{d-2}) ...) ++ a_2) ++ a_1
    

    is O(sum [i * length a_i | i <- [1 .. d]]) to evaluate fully.

    With difference lists (sans the newtype wrapper for simplicity of exposition), it's not important whether the compositions are left-nested

    ((((a ++) . (b ++)) . (c ++)) . (d ++)) . (e ++)
    

    or right-nested. Once you have traversed the nesting to reach the (a ++), that (++) is hoisted to the top of the expression tree, so getting at each element of a is again O(1).

    In fact, the whole composition is reassociated with difference lists, as soon as you require the first element,

    ((((a ++) . (b ++)) . (c ++)) . (d ++)) . (e ++) $ f
    

    becomes

    ((((a ++) . (b ++)) . (c ++)) . (d ++)) $ (e ++) f
    (((a ++) . (b ++)) . (c ++)) $ (d ++) ((e ++) f)
    ((a ++) . (b ++)) $ (c ++) ((d ++) ((e ++) f))
    (a ++) $ (b ++) ((c ++) ((d ++) ((e ++) f)))
    a ++ (b ++ (c ++ (d ++ (e ++ f))))
    

    and after that, each list is the immediate left operand of the top-level (++) after the preceding list has been consumed.

    The important thing in that is that the prepending function (a ++) can start producing its result without inspecting its argument, so that the reassociation from

                 ($)
                 / \
               (.)  f
               / \
             (.) (e ++)
             / \
           (.) (d ++)
           / \
         (.) (c ++)
         / \
    (a ++) (b ++)
    

    via

               ($)---------
               /           \
             (.)           ($)
             / \           / \
           (.) (d ++) (e ++)  f
           / \
         (.) (c ++)
         / \
    (a ++) (b ++)
    

    to

         ($)
         / \
    (a ++) ($)
           / \
      (b ++) ($)
             / \
        (c ++) ($)
               / \
          (d ++) ($)
                 / \
            (e ++)  f
    

    doesn't need to know anything about the composed functions of the final list f, so it's just an O(depth) rewriting. Then the top-level

         ($)
         / \
    (a ++)  stuff
    

    becomes

     (++)
     /  \
    a    stuff
    

    and all elements of a can be obtained in one step. In this example, where we had pure left-nesting, only one rewriting is necessary. If instead of (for example) (d ++) the function in that place had been a left-nested composition, (((g ++) . (h ++)) . (i ++)) . (j ++), the top-level reassociation would leave that untouched and this would be reassociated when it becomes the left operand of the top-level ($) after all previous lists have been consumed.

    The total work needed for all reassociations is O(number of lists), so the overall cost for the concatenation is O(number of lists + sum (map length lists)). (That means you can bring bad performance to this too, by inserting a lot of deeply left-nested ([] ++).)

    The

    newtype DiffList a = DiffList {getDiffList :: [a] -> [a] }
    
    instance Monoid (DiffList a) where
        mempty = DiffList (\xs -> [] ++ xs)
        (DiffList f) `mappend` (DiffList g) = DiffList (\xs -> f (g xs))
    

    just wraps that so that it is more convenient to handle abstractly.

    DiffList (a ++) `mappend` DiffList (b ++) ~> DiffList ((a ++) . (b++))
    

    Note that it is only efficient for functions that don't need to inspect their argument to start producing output, if arbitrary functions are wrapped in DiffLists, you have no such efficiency guarantees. In particular, appending ((++ a), wrapped or not) can create left-nested trees of (++) when composed right-nested, so you can create the O(n²) concatenation behaviour with that if the DiffList constructor is exposed.