Performance increase using `let`?

Compare the following two snippets:

expensiveFunction :: [Integer] -> [Integer -> Bool]
expensiveFunction [] = [const False]
expensiveFunction (x:xs) =
  let temp = expensiveFunction xs
  in [someOtherFunction x y | y <- temp] ++ temp

and

expensiveFunction :: [Integer] -> [Integer -> Bool]
expensiveFunction [] = [const False]
expensiveFunction (x:xs) =
  [someOtherFunction x y | y <- expensiveFunction xs] ++ expensiveFunction xs

The first snippet seems more efficient: it only calls expensiveFunction once per recursive call, stores the result in a local variable temp, and then uses that variable twice. However, it could also be that temp acts only as a shorthand for expensiveFunction xs, so that every time that temp is used actually counts as a call to expensiveFunction.

My question: does the first snippet only call expensiveFunction once? More generally: do local variables defined in let expressions act like they would in other programming paradigms (e.g. in C++)? That is, do they store the value of their defining expressions? Or are they merely shorthand for their defining expressions?

Solution

Compiled with GHC and optimizations turned on, the two functions have equivalent performance.

You can prove this to yourself by using the very useful magical incantation:

ghc -ddump-simpl -dsuppress-all -dsuppress-uniques -fforce-recomp -O2 ExpensiveLet.hs

Aside from -O2 to turn on optimizations, the other flags are documented here. The most important flag is -ddump-simpl which dumps "core", an intermediate representation of your program that's reasonably Haskell-like, after the "simplifier" has run. (The simplifier is an optimization engine that runs repeatedly as a series of "core-to-core" translations and performs most of the big ticket optimizations of a Haskell program.)

((Digression: The other flags are mostly for convenience. The default core output is packed with additional annotations that are important to the compiler but less important to a human reader, and these can be suppressed individually with specific -dsuppress-* flags, or most of them can be suppressed all at once with -dsuppress-all. This flag doesn't include "uniques" which are variables that have been renamed with suffixes to make them unique across the program which, again, is distracting for humans: -dsuppress-uniques suppresses those. Technically, the resulting core can be misleading, because two different variables in the same block of code might both be called x, but in practice this is rarely an issue. The flag -fforce-recomp makes sure that GHC recompiles the file (and dumps the core) even if the program hasn't changed. Normally, GHC won't recompile a file that hasn't changed, which is fine in normal use, but a bit of a problem if you're trying to get the output from -ddump-simpl.

I'd encourage you to try ghc -ddump-simpl -fforce-recomp -O2 ... without the other flags, to see what the output looks like. Also, it can be instructive to compile with and without -O2 to see the difference in the resulting core compiled with and without optimizations.

There are tons of other interesting -ddump-* flags, too. GHC uses two additional intermediate languages, STG (the "spineless, tagless G-machine" -- don't ask) a low-level functional language you can dump with -ddump-stg-final and Cmm (C-minus-minus) a sort of processor independent assembly with a vaguely C-like syntax which you can dump with -ddump-cmm, before finally getting to assembly -ddump-asm. You can dump them all at once if you want with all the unsuppressed annotations:

ghc -ddump-simpl -ddump-stg-final -ddump-cmm -ddump-asm -fforce-recomp -O2 ExpensiveLet.hs

Fun!

End of Digression.))

Anyway, when this command is run on a test program:

module ExpensiveLet where

someOtherFunction = undefined

expensiveFunction1 :: [Integer] -> [Integer -> Bool]
expensiveFunction1 [] = [const False]
expensiveFunction1 (x:xs) =
  let temp = expensiveFunction1 xs
  in [someOtherFunction x y | y <- temp] ++ temp

expensiveFunction2 :: [Integer] -> [Integer -> Bool]
expensiveFunction2 [] = [const False]
expensiveFunction2 (x:xs) =
  [someOtherFunction x y | y <- expensiveFunction2 xs] ++ expensiveFunction2 xs

you'll see that expensiveFunction1 is compiled to an intermediate representation something like:

expensiveFunction1
  = \ ds ->
      case ds of {
        [] -> lvl12;
        : x xs ->
          let { temp = expensiveFunction1 xs } in
          letrec {
            go1
              = \ ds1 ->
                  case ds1 of {
                    [] -> temp;
                    : y ys -> : (case someOtherFunction of wild2 { }) (go1 ys)
                  }; } in
          go1 temp
      }

while expensiveFunction2 is compiled to:

expensiveFunction2 = expensiveFunction1

GHC already performs the optimization of "lifting" the two uses of expensiveFunction2 xs into a temporary variable, and it goes even further by determining that its automatically optimized version is actually equivalent to your hand-optimized version so that the two function definitions are identical and can share the same code.

Note that this is hardly unique to GHC. If you compile a C snippet:

int f(int a, int b)
{
        return 2*a + b;
}

int g(int a)
{
        int x = f(a, a+2);
        int y = f(a, a+2);
        return x+y;
}

then any C compiler worth its salt is going to recognize that x and y are the same value, calculate it once, and double it.

One of the big differences between Haskell and C is that a Haskell compiler, in principle, can optimize g this way without knowing the definition of f. You can't do that in C, since f might rely on internal state that results in f(a, a+2) returning two different values on consecutive calls. So, a C compiler will only optimize g well if the definition of f is known at compile time.

In practice, however, so much more work has been done on making C (and C++) compilers good at optimization that they'll far out-optimize GHC in almost all cases. As an example, for the above snippet, gcc -O2 inlines f and is able to compile the entire function g() to three assembly instructions:

leal    2(%rdi,%rdi,2), %eax    -- 2 + a + a*2
addl    %eax, %eax              -- times two
ret

GHC actually does okay here. With the equivalent Haskell program:

f :: Int -> Int -> Int
f a b = 2*a + b

g :: Int -> Int
g a = let x = f a (a+2)
          y = f a (a+2)
      in  x + y

it optimizes the "core" calculation in g to:

imulq $6,%rax    -- 6*a
addq $4,%rax     -- +4

but you can be guaranteed that these two instructions are slower than GCC's two instructions, and there's a whole bunch of runtime-related overhead that means the actual Haskell g function is 24 assembly instructions in total.

So, the answers to your question plus some additional questions you didn't ask are:

Both Haskell snippets only call expensiveFunction once per recursive call.
In Haskell, local variables defined in let expressions are more like shorthand for their defining expressions (immutable bindings) than variables in other languages (mutable references), but this relates to the semantics of the program -- its behavior and "meaning" -- not performance of the resulting compiled code.
Because of the way compilers are designed, distinctions like "shorthand for an expression" or "storing the value of the expression" don't translate in any obvious way to real world performance.