haskellreferential-transparency

Referential transparency confusion in Haskell


First up, despite this being an implementation of the Sieve of Eratosthenes this is not a homework question. I can find implementations of the Sieve in many intro books :).

The question I have is that around my confusion around referential transparency, which was one of the virtues of functional programming.

My understanding was that I could replace f(x) anywhere in my program with y, provided y = f(x) and I was in a pure function.

A naive implementation of the n-th prime using the Sieve "by hand" is

nums0 = [2..]  -- allowed by lazy evaluation
nums1 = [n | n <- nums0, mod n (head nums0) /= 0] -- works, gives [3,5,7,9,11,13,15...]
nums2 = [n | n <- nums1, mod n (head nums1) /= 0] -- works, gives [5,7,11,13,...]
nums3 = [n | n <- nums2, mod n (head nums2) /= 0] -- works, gives [7,11,13,...]

It is clear that there is a recursive pattern here. Let's call ndnp m the function that returns the list of natural numbers greater than 1 that are not divisible by the first m primes i.e.

nums0 = ndnp 0  -- (i.e. all numbers starting at 2)
nums1 = ndnp 1  -- (not divisible by 2)
nums2 = ndnp 2  -- (not divisible by 2 or 3)

To find the Nth prime, we can simply do head $ ndnp (n-1).

The example of constructing the numsX array has a recursive structure:

-- ndnp: Not Divisible by first N Primes
ndnp :: Integer -> [Integer]
ndnp 0 = [2..]
ndnp n = [n | n <- (ndnp (n-1)), mod n (head $ ndnp (n-1)) /= 0]

In particular, ndnp 1 is built from ndnp 0 or nums0

When run take 10 $ ndnp 0 I get the answer I expect. When I run take 10 $ ndnp 1 I get a stackoverflow.

My question:

Note: There is a fix for this I found

ndnp :: Integer -> [Integer]
ndnp 0 = [2..]
ndnp n = [n | n <- arr, mod n (head $ arr) /= 0]
     where arr = ndnp (n-1)

which makes my code work, but doesn't give me a good conceptual understanding on when I have referential transparency and when I don't.


Solution

  • If you have warnings turned on (which is good practice -- it's kind of a mystery to me that so many compilers, GHC included, default to not printing warnings), then you'll see something like:

    Referential.hs:6:15: warning: [-Wname-shadowing]
        This binding for ā€˜nā€™ shadows the existing binding
          bound at Referential.hs:6:6
      |
    6 | ndnp n = [n | n <- (ndnp (n-1)), mod n (head $ ndnp (n-1)) /= 0]
                      ^
    

    which will clue you in on what's going wrong. The Haskell scoping rules mean that your original definition of ndnp is equivalent to:

    ndnp n = [m | m <- (ndnp (n-1)), mod m (head $ ndnp (m-1)) /= 0]
              ^   ^                      ^               ^
    

    That is, your new definition of n doesn't capture the n in the first occurrence of ndnp (n-1), but it does capture all the other ns in the comprehension.

    When you try to evaluate ndnp 1 you get:

    ndnp 1 = [m | m <- ndnp 0, mod m (head $ ndnp (m-1)) /= 0]
    

    and the list comprehension starts with m = 2 which calls ndnp (m-1) = ndnp 1 within the guard, resulting in an infinite loop.

    You can fix the problem by renaming the newly introduced iteration variable in the list comprehension:

    ndnp n = [n' | n' <- (ndnp (n-1)), mod n' (head $ ndnp (n-1)) /= 0]
    

    Observe that substituting n = 1 gives:

    ndnp 1 = [n' | n' <- ndnp 0, mod n' (head $ ndnp 0) /= 0]
    

    and then substituting ndnp 0 = nums0 gives:

    ndnp 1 = [n' | n' <- nums0, mod n' (head nums0) /= 0]
    

    which precisely matches your definition of nums1 up to variable substitution.

    So, there's no violation of referential transparency in this example.

    The fix you found:

    ndnp n = [n | n <- arr, mod n (head $ arr) /= 0]
         where arr = ndnp (n-1)
    

    is scoped differently. The Haskell scoping rules mean that it's equivalent to:

    ndnp n = [m | m <- arr, mod m (head $ arr) /= 0]
              ^   ^             ^
         where arr = ndnp (n-1)
    

    See how the new variable m captures every occurrence of m in the list comprehension, but doesn't capture the n in the definition of arr? This means that this is equivalent to:

    ndnp n = [m | m <- ndnp (n-1), mod m (head $ ndnp (n-1)) /= 0]
              ^   ^                    ^
    

    where neither ndnp (n-1) is captured, making it equivalent to my "fixed" version above, up to variable renaming:

    ndnp n = [n' | n' <- (ndnp (n-1)), mod n' (head $ ndnp (n-1)) /= 0]
              ^^   ^^                      ^^