haskellfunctional-programmingdata-sharing

Sharing data in Haskell


I would like to understand how the memory sharing mechanism works in Haskell. Indeed, a way to program a function calculating the terms of the fibonnaci sequence is:

fibo' n = f n
  where 
    f 0 = 1
    f 1 = 1
    f n = fibs !! (n-1) + fibs !! (n-2)
    fibs = [f x | x <- [0..]]

This assumes for there to be an improvement that the fibs list is not evaluated multiple times but is shared between calls but I'm not sure about this assumption or how it works.


Solution

  • As a first approximation, you can just use variable scopes to predict what will be shared.

    All the places that use fibs are within the same scope, so each occurrence of that name resolves to the same object in memory. That means each time f is invoked within one one top-level fibo' call, there is one single fibs list. It's shared across f calls.

    However, fibs is defined in a where scope, attached to a function equation fibo' n =. That means the scope is "inside" each call of the fibo' (on some particular n); all the names defined in the where clause refer to different objects in memory each time fibo' is called. (This is just like how local variables - defined inside a function - work in any mainstream programming language)

    What you have here is a function that will work with a list of pre-computed Fibonacci numbers to save recomputation within one top-level call, but then will throw that away and start from scratch on the next top-level call.

    That may be exactly what you want, so this isn't wrong per se. If fibs was defined in a scope outside the top-level application, then it would be shared across all calls, but that also means it would permanently occupy memory to remain available for the next call. As objects in Haskell can expand in memory as they are evaluated more-deeply, this could be considered a memory leak. Throwing it away after each top-level call instead means multiple calls repeat some work, but you're only using the memory needed to speed up each call (rather than the memory needed for the largest call you have ever made), and only while it is running. So which way is "better" depends on what the rest of your program is doing with this function, not on this function itself.

    Note that none of the definitions in your where clause are actually using the arguments of fibo', and fibo' itself isn't really "doing anything"; it just forwards the call immediately to f. So the only "interesting" effect of putting your code in a where like this is creating a scope where you can define fibs inside the top-level call but outside the inner f calls. (I'm considering the access control effects to be non-interesting). If you didn't want that, and did want fibs to be shared across calls, it would be simpler to just define your code like this:

    fibo'' 0 = 1
    fibo'' 1 = 1
    fibo'' n = fibs !! (n-1) + fibs !! (n-2)
    
    fibs = [fibo'' x | x <- [0..]]
    

    Now fibs is just a single object in the global scope, and will be shared across all calls (but will occupy memory across all calls too).

    If you do care about the access-control (nothing else can access fibs if it's defined in a local scope, but can if it's defined in a global scope), you can instead do this:

    fibo''' = f
      where 
        f 0 = 1
        f 1 = 1
        f n = fibs !! (n-1) + fibs !! (n-2)
        fibs = [f x | x <- [0..]]
    

    Here fibs (and f) are defined in a local scope. But the equation which the where is attached to is just fibo''' = f. This local scope is still "outside" the top-level call, because the scope is "entered" before fibo''' receives its argument. Whereas the original fibo' has the where attached to an equation that has already brought the argument n into scope, so the where scope is "entered" after each top-level call is made.


    Now, I did start this post with "as a first approximation". The compiler is allowed to reorganise your code to try and optimise it. For example it could theoretically notice that fibs doesn't depend on the argument n, and so move it outside the top-level call, making your fibo' behave like fibo'''. Or (again theoretically) it could decide to inline fibs which would remove all sharing and make your fibo' behave like a naiive recursive Fibonacci calculator.

    In practice, however, it is extremely unlikely to do either of these things. It's "allowed" to do any transformation that doesn't change the end result of your code, but the transformations the GHC developers actually put into the compiler are designed to make your code better. So they try very hard not to increase or reduce sharing in a way that causes large memory leaks or slowdowns.

    So while the compiled and optimised code frequently is quite different from what you actually wrote, it's pretty safe to reason about your code under the assumption that "if you give something a name, all uses of that name within the same scope will be shared". You just have to be careful about when exactly local scopes are "entered" so that you know what counts as "within the same scope".