algorithmhaskell

Haskell primality test optimization. Eratosthenes sieve ran slower than direct solving


Using this code to identify primes:

sieve:: [Integer] -> [Integer]
sieve [] = []
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]

isPrime' :: Integer -> Bool
isPrime' 2 = True
isPrime' x = x == last (sieve [2..x])

isPrime :: Integer -> Bool
isPrime 2 = True
isPrime p = p > 2 && (all (\n -> p `mod` n /= 0) $ takeWhile (\n -> n * n <= p) [3, 5 ..])

the function isPrime' runs much slower and takes more space than isPrime. Using elem instead of last also runs slowly.

ghci> isPrime' 10007
True
(0.33 secs, 181,461,088 bytes)
ghci> isPrime 10007
True
(0.01 secs, 92,120 bytes)

How can I make isPrime' run faster? I thought it might be lazy evaluation, but I have no idea how to fix it.


Solution

  • First of all, there is a small error in the isPrime function, indeed, it will return True for all even number greater than 2, like:

    ghci> isPrime 4
    True
    

    but we can fix that with:

    isPrime :: Integer -> Bool
    isPrime p = p == 2 || (odd p && (all (\n -> p `mod` n /= 0) $ takeWhile (\n -> n * n <= p) [3, 5 ..]))
    

    It is not that surprising that isPrime' is slow: it requires constructing and list of primes up to that number, which runs in O(n×p) with n the number to check, and p the number of primes up to that number. Indeed, we first have a list of all integers up to that number, then we filter out all items dividable by two, then we enumerate once again to check filter out items dividable by three, then we enumerate over the remaining list to filter out items dividable by five, and so on. Although the list to filter is always smaller, it still builds up a lot of modulo checks: the element 11 for example will be checked if it is modulo 2, 3, 5 and 7. So it performs a lot of work to get the next element. We need to do these checks, even if we want to know if 17 is a prime number. Whereas the isPrime function does not first has to validate that 11 is prime, it checks if 17 is prime.

    The isPrime on the other hand, runs in O(√n) with n the number to check, and all these checks are performed on the same number: the number that we want to check. So it is a more straightforward way to check if something is prime.

    I thought it might be lazy evaluation, but I have no idea how to fix it.

    No, the lazy evaluation will, at most, have a small effect, the bookkeeping of lazy functions indeed has some overhead, but typically is not that much.

    The sieve method can be improved, because if p is a prime, it is sufficient to filter out p2, p2+p, p2+2p, p2+3p, etc. so this can reduce modulo checks significantly. But still likely performs a lot of extra work. After all, the sieve method is mainly used to generate a list of all primes (up to a certain upperbound), not to check if an individual element is prime. The sieve concurrently checks if a lot of elements are prime, and when an element p is validated to be prime this helps determining if the next elements are prime.