data-structuresf#functional-programmingperformancememory-footprint

How is memory-efficient non-destructive manipulation of collections achieved in functional programming?


I'm trying to figure out how non-destructive manipulation of large collections is implemented in functional programming, ie. how it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory. (Even if the original collection would be garbage-collected, I'd expect the memory footprint and general performance of such a collection to be awful.)

This is how far I've got until now:

Using F#, I came up with a function insert that splits a list into two pieces and introduces a new element in-between, seemingly without cloning all unchanged elements:

// return a list without its first n elements:
// (helper function)
let rec skip list n =
    if n = 0 then
        list
    else
        match list with
        | []    -> []
        | x::xs -> skip xs (n-1)

// return only the first n elements of a list:
// (helper function)
let rec take list n =
    if n = 0 then
        []
    else
        match list with
        | []    -> []
        | x::xs -> x::(take xs (n-1))

// insert a value into a list at the specified zero-based position:
let insert list position value =
    (take list position) @ [value] @ (skip list position)

I then checked whether objects from an original list are "recycled" in new lists by using .NET's Object.ReferenceEquals:

open System

let (===) x y =
    Object.ReferenceEquals(x, y)

let x = Some(42)
let L = [Some(0); x; Some(43)]
let M = Some(1) |> insert L 1

The following three expressions all evaluate to true, indicating that the value referred to by x is re-used both in lists L and M, ie. that there is only 1 copy of this value in memory:

L.[1] === x
M.[2] === x
L.[1] === M.[2]

My question:

Do functional programming languages generally re-use values instead of cloning them to a new memory location, or was I just lucky with F#'s behaviour? Assuming the former, is this how reasonably memory-efficient editing of collections can be implemented in functional programming?

(Btw.: I know about Chris Okasaki's book Purely functional data structures, but haven't yet had the time to read it thoroughly.)


Solution

  • I'm trying to figure out how non-destructive manipulation of large collections is implemented in functional programming, ie. how it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory.

    This page has a few descriptions and implementations of data structures in F#. Most of them come from Okasaki's Purely Functional Data Structures, although the AVL tree is my own implementation since it wasn't present in the book.

    Now, since you asked, about reusing unmodified nodes, let's take a simple binary tree:

    type 'a tree =
        | Node of 'a tree * 'a * 'a tree
        | Nil
    
    let rec insert v = function
        | Node(l, x, r) as node ->
            if v < x then Node(insert v l, x, r)    // reuses x and r
            elif v > x then Node(l, x, insert v r)  // reuses x and l
            else node
        | Nil -> Node(Nil, v, Nil)
    

    Note that we re-use some of our nodes. Let's say we start with this tree:

    When we insert an e into the the tree, we get a brand new tree, with some of the nodes pointing back at our original tree:

    If we don't have a reference to the xs tree above, then .NET will garbage collect any nodes without live references, specifically thed, g and f nodes.

    Notice that we've only modified nodes along the path of our inserted node. This is pretty typical in most immutable data structures, including lists. So, the number of nodes we create is exactly equal to the number of nodes we need to traverse in order to insert into our data structure.

    Do functional programming languages generally re-use values instead of cloning them to a new memory location, or was I just lucky with F#'s behaviour? Assuming the former, is this how reasonably memory-efficient editing of collections can be implemented in functional programming?

    Yes.

    Lists, however, aren't a very good data structure, since most non-trivial operations on them require O(n) time.

    Balanced binary trees support O(log n) inserts, meaning we create O(log n) copies on every insert. Since log2(10^15) is ~= 50, the overhead is very very tiny for these particular data structures. Even if you keep around every copy of every object after inserts/deletes, your memory usage will increase at a rate of O(n log n) -- very reasonable, in my opinion.