haskellfold

Is foldl or foldr more efficient for intercalating list elements?


| l > 1     = foldr (\a b -> a ++ "-----\n" ++ b) "" xs

xs is a list of strings, and the intention is to join each element in the list with five dashes and a line break.

Would it be more efficient to use foldl or foldr in this scenario, and why? I am aware that intercalate would suit this purpose well, but in my case, I cannot use imports.


Solution

  • foldr is better here, because ++ (and lists in general) are "naturally" right-associative.

    The definition of ++ is:

    (++) :: [a] -> [a] -> [a]
    (++) []     ys = ys
    (++) (x:xs) ys = x : xs ++ ys
    

    Note that the second argument ys is used unmodified in both equations, whereas the first list is pulled apart and its elements prepended onto the front of the new list being constructed. So the stuff that appears on the left of a ++ has to be scanned and rebuilt, while the stuff that appears on the right costs nothing. Even if we completely unfold the recursive definition of ++ in something like:

    let xs = [x1, x2, x3, x4]
     in xs ++ ys
    

    Then we get x1 : x2 : x3 : x4 : ys; ys just gets stored as-is in the field of a data constructor (the most-deeply nested :), and thus is never inspected by the workings of ++. But the entire xs list had to be traversed and rebuilt onto the front of ys.

    So as we fold to build up a list, we want to make sure that the accumulating parameter (which is going to grow bigger and bigger as we have processed more of the list) is always used as the right argument of ++. That way it won't need to be re-traversed at every level of the fold.

    So foldr (\a b -> a ++ "-----\n" ++ b) "" xs is good; it's the b parameter that receives the result of folding the rest of the list, and it is used as the right argument to ++ where it won't need to be examined at all. That means the cost of each level of the fold is the cost of traversing a and the cost of traversing "-----\n". And all together the cost of the entire fold is simply the sum of the costs for all the a values (so the sum of the lengths of the lists in xs), plus a number of small fixed strings equal to the number of elements in xs.

    If we use a left fold like foldl (\b a -> b ++ "-----\n" ++ a) "" xs, the situation is much worse. Here each level of the fold uses the element of the list (a) on the right side of ++, and uses the previous result b on the left side. So we leave a untouched, and have to traverse b to prepend all of its elements onto the front of a. That's good at the bottom of the fold where b is the "" starting value. But at a level out b will be the result of folding in the first a string (plus the "-----\n" separator string), so we have to traverse that a string anyway. And then one level out again b will be the result of ++ing another string onto the front of that, so we have to traverse that new string plus traversing the first one again. In folding the whole list of N elements this way we end up travering the first string N times, the second N-1 times, etc. This is far more costly than using a right fold.

    It might occur to you that you could use a left fold but still put the accumulating parameter on the right side of ++ (so long as you don't mind the order in the final string being reversed). So:

    foldl (\b a -> a ++ "-----\n" ++ b) "" xs
    

    Performance wise this isn't as bad as the non-reversed left fold. At each level we only have to traverse the incoming element from the list, and leave the result of folding the rest of the list unexamined, just like the right fold. The problem here (which is inherent to all left folds, no matter what function you use) is that expanding one level of the foldl recursion (assuming xs isn't empty) gives us:

        foldl (\b a -> a ++ "-----\n" ++ b) "" (x : xs)
    ==> foldl (\b a -> a ++ "-----\n" ++ b) (x ++ "-----\n" ++ "") xs
    

    It expands to another call to foldl, which we have to evaluate before we know what element is at the head of the resulting list. Expanding it one more step (if xs isn't empty) will return another call to foldl. It will keep doing so until we hit the base case and finally return that big chain of ++ applications. So we can't get the first element of the final list until we've traversed the entire original list of strings.

    If we look at the right fold version, we see something different. Expanding one level of the foldr recursion (assuming xs isn't empty) gives us:

        foldr (\a b -> a ++ "-----\n" ++ b) "" (x : xs)
    ==> x ++ "-----\n" ++ foldr (\a b -> a ++ "-----\n" ++ b) "" xs) 
    

    The resulting expression isn't a call to foldr, it's a call to ++ (the left one is the outermost thing being called here; all the rest is inside the right argument of that ++). We only have to expand that ++ one step to determine that the first element of the final folded list is the first character of x (which was the first string in the original list of strings). We can do that without having to traverse any of the rest of the xs list of strings, or even processing the "-----\n" separator once.

    This "streaming" behaviour of foldr (where we can get the first section of its output after only examining the first section of the input list) is what enables foldr to process infinite lists where foldl fundamentally cannot, and if we end up using functions like take that don't examine all of the result list we save a lot of work ever traversing the entire input list. But even in the cases where you do eventually read the entire result (so you'll be traversing the entire input either way), it's still usually faster to do things this way. It allows elements to be processed one at a time all the way through multiple stages of a pipeline of similar "streaming" functions, which is often better for CPU cache locality than doing each of the stages to the entire list before going back to the start and doing the next stage. If the initial producer of the input list and the final consumer of the output list also work in this "streaming" fashion that saves having to store the entire list in memory at once, which can save a lot of time allocating and garbage collecting the memory. There are also optimisations (such as rewrite rules) applying to a lot of the standard list functions that can often optimise away the creation of the : cells for the intermediate lists entirely, which saves even more time.

    Ultimately lists are "naturally" right-associative, because their definition in terms of : cells is recursive on the right. It's almost always better to build and process them in a right-associative fashion if you can.

    The main exception is when you're processing a list into a very small (in terms of memory size) value that doesn't have any substructure you can pattern match on, like an Int or a Double. In that case it's impossible to do that operation in a streaming fashion; if you're getting the length of a list as an Int there's no "first section" of the Int that could be available after traversing only the first section of the list. The Int is all-or-nothing. In that case foldl' (note the ' suffix for a strict left fold) is usually faster, as a right fold would build up a big chain of operations that can't be consumed lazily; the strict left fold can perform them as it goes.