First up, despite this being an implementation of the Sieve of Eratosthenes this is not a homework question. I can find implementations of the Sieve in many intro books :).
The question I have is that around my confusion around referential transparency, which was one of the virtues of functional programming.
My understanding was that I could replace f(x) anywhere in my program with y, provided y = f(x) and I was in a pure function.
A naive implementation of the n-th prime using the Sieve "by hand" is
nums0 = [2..] -- allowed by lazy evaluation
nums1 = [n | n <- nums0, mod n (head nums0) /= 0] -- works, gives [3,5,7,9,11,13,15...]
nums2 = [n | n <- nums1, mod n (head nums1) /= 0] -- works, gives [5,7,11,13,...]
nums3 = [n | n <- nums2, mod n (head nums2) /= 0] -- works, gives [7,11,13,...]
It is clear that there is a recursive pattern here. Let's call ndnp m the function that returns the list of natural numbers greater than 1 that are not divisible by the first m primes i.e.
nums0 = ndnp 0 -- (i.e. all numbers starting at 2)
nums1 = ndnp 1 -- (not divisible by 2)
nums2 = ndnp 2 -- (not divisible by 2 or 3)
To find the Nth prime, we can simply do head $ ndnp (n-1).
The example of constructing the numsX array has a recursive structure:
-- ndnp: Not Divisible by first N Primes
ndnp :: Integer -> [Integer]
ndnp 0 = [2..]
ndnp n = [n | n <- (ndnp (n-1)), mod n (head $ ndnp (n-1)) /= 0]
In particular, ndnp 1 is built from ndnp 0 or nums0
When run take 10 $ ndnp 0 I get the answer I expect. When I run take 10 $ ndnp 1 I get a stackoverflow.
My question:
a = ndnp 1 is legal. Evaluation is lazy, which I understand.take, I get a stack overflow (which suggests it is trying to evaluate the whole structure). I could understand if mod or something forced eager evaluation, but I know this isn't the case because ...take 10 nums1 works just fine, and it is build off the same steps of looping through an infinite, lazily eval'ed list!nums0 = ndnp 0, why is the execution of take 10 $ nums1 fine (which uses nums0 and iterates over it), but take 10 $ ndnp 1 broken (which uses the result of ndnp 0 and iterates over it)? Referrential transparency suggests I should be able to swap one out for the other, but in one case I retain lazy evaluation and the other generates a stack overflow!Note: There is a fix for this I found
ndnp :: Integer -> [Integer]
ndnp 0 = [2..]
ndnp n = [n | n <- arr, mod n (head $ arr) /= 0]
where arr = ndnp (n-1)
which makes my code work, but doesn't give me a good conceptual understanding on when I have referential transparency and when I don't.
If you have warnings turned on (which is good practice -- it's kind of a mystery to me that so many compilers, GHC included, default to not printing warnings), then you'll see something like:
Referential.hs:6:15: warning: [-Wname-shadowing]
This binding for ‘n’ shadows the existing binding
bound at Referential.hs:6:6
|
6 | ndnp n = [n | n <- (ndnp (n-1)), mod n (head $ ndnp (n-1)) /= 0]
^
which will clue you in on what's going wrong. The Haskell scoping rules mean that your original definition of ndnp is equivalent to:
ndnp n = [m | m <- (ndnp (n-1)), mod m (head $ ndnp (m-1)) /= 0]
^ ^ ^ ^
That is, your new definition of n doesn't capture the n in the first occurrence of ndnp (n-1), but it does capture all the other ns in the comprehension.
When you try to evaluate ndnp 1 you get:
ndnp 1 = [m | m <- ndnp 0, mod m (head $ ndnp (m-1)) /= 0]
and the list comprehension starts with m = 2 which calls ndnp (m-1) = ndnp 1 within the guard, resulting in an infinite loop.
You can fix the problem by renaming the newly introduced iteration variable in the list comprehension:
ndnp n = [n' | n' <- (ndnp (n-1)), mod n' (head $ ndnp (n-1)) /= 0]
Observe that substituting n = 1 gives:
ndnp 1 = [n' | n' <- ndnp 0, mod n' (head $ ndnp 0) /= 0]
and then substituting ndnp 0 = nums0 gives:
ndnp 1 = [n' | n' <- nums0, mod n' (head nums0) /= 0]
which precisely matches your definition of nums1 up to variable substitution.
So, there's no violation of referential transparency in this example.
The fix you found:
ndnp n = [n | n <- arr, mod n (head $ arr) /= 0]
where arr = ndnp (n-1)
is scoped differently. The Haskell scoping rules mean that it's equivalent to:
ndnp n = [m | m <- arr, mod m (head $ arr) /= 0]
^ ^ ^
where arr = ndnp (n-1)
See how the new variable m captures every occurrence of m in the list comprehension, but doesn't capture the n in the definition of arr? This means that this is equivalent to:
ndnp n = [m | m <- ndnp (n-1), mod m (head $ ndnp (n-1)) /= 0]
^ ^ ^
where neither ndnp (n-1) is captured, making it equivalent to my "fixed" version above, up to variable renaming:
ndnp n = [n' | n' <- (ndnp (n-1)), mod n' (head $ ndnp (n-1)) /= 0]
^^ ^^ ^^