I'm writing a recursive infinite prime number generator, and I'm almost sure I can optimize it better.
Right now, aside from a lookup table of the first dozen primes, each call to the recursive function receives a list of all previous primes.
Since it's a lazy generator, right now I'm just filtering out any number that is modulo 0 for any of the previous primes, and taking the first unfiltered result. (The check I'm using short-circuits, so the first time a previous prime divides the current number evenly it aborts with that information.)
Right now, my performance degrades around when searching for the 400th prime (37,813). I'm looking for ways to use the unique fact that I have a list of all prior primes, and am only searching for the next, to improve my filtering algorithm. (Most information I can find offers non-lazy sieves to find primes under a limit, or ways to find the pnth prime given pn-1, not optimizations to find pn given 2...pn-1 primes.)
For example, I know that the pnth prime must reside in the range (pn-1 + 1)...(pn-1+pn-2). Right now I start my filtering of integers at pn-1 + 2 (since pn-1 + 1 can only be prime for pn-1 = 2, which is precomputed). But since this is a lazy generator, knowing the terminal bounds of the range (pn-1+pn-2) doesn't help me filter anything.
What can I do to filter more effectively given all previous primes?
@doc """
Creates an infinite stream of prime numbers.
iex> Enum.take(primes, 5)
[2, 3, 5, 7, 11]
iex> Enum.take_while(primes, fn(n) -> n < 25 end)
[2, 3, 5, 7, 11, 13, 17, 19, 23]
"""
@spec primes :: Stream.t
def primes do
Stream.unfold( [], fn primes ->
next = next_prime(primes)
{ next, [next | primes] }
end )
end
defp next_prime([]), do: 2
defp next_prime([2 | _]), do: 3
defp next_prime([3 | _]), do: 5
defp next_prime([5 | _]), do: 7
# ... etc
defp next_prime(primes) do
start = Enum.first(primes) + 2
Enum.first(
Stream.drop_while(
Integer.stream(from: start, step: 2),
fn number ->
Enum.any?(primes, fn prime ->
rem(number, prime) == 0
end )
end
)
)
end
The primes
function starts with an empty array, gets the next prime for it (2
initially), and then 1) emits it from the Stream and 2) Adds it to the top the primes stack used in the next call. (I'm sure this stack is the source of some slowdown.)
The next_primes
function takes in that stack. Starting from the last known prime+2, it creates an infinite stream of integers, and drops each integer that divides evenly by any known prime for the list, and then returns the first occurrence.
This is, I suppose, something similar to a lazy incremental Eratosthenes's sieve.
You can see some basic attempts at optimization: I start checking at pn-1+2, and I step over even numbers.
I tried a more verbatim Eratosthenes's sieve by just passing the Integer.stream through each calculation, and after finding a prime, wrapping the Integer.stream in a new Stream.drop_while that filtered just multiples of that prime out. But since Streams are implemented as anonymous functions, that mutilated the call stack.
It's worth noting that I'm not assuming you need all prior primes to generate the next one. I just happen to have them around, thanks to my implementation.
I wrote a program that generates the prime numbers in order, without limit, and used it to sum the first billion primes at my blog. The algorithm uses a segmented Sieve of Eratosthenes; additional sieving primes are calculated at each segment, so the process can continue indefinitely, as long as you have space to store the sieving primes. Here's pseudocode:
function init(delta) # Sieve of Eratosthenes
m, ps, qs := 0, [], []
sieve := makeArray(2 * delta, True)
for p from 2 to delta
if sieve[p]
m := m + 1; ps.insert(p)
qs.insert(p + (p-1) / 2)
for i from p+p to n step p
sieve[i] := False
return m, ps, qs, sieve
function advance(m, ps, qs, sieve, bottom, delta)
for i from 0 to delta - 1
sieve[i] := True
for i from 0 to m - 1
qs[i] := (qs[i] - delta) % ps[i]
p := ps[0] + 2
while p * p <= bottom + 2 * delta
if isPrime(p) # trial division
m := m + 1; ps.insert(p)
qs.insert((p*p - bottom - 1) / 2)
p := p + 2
for i from 0 to m - 1
for j from qs[i] to delta step ps[i]
sieve[j] := False
return m, ps, qs, sieve
Here ps
is the list of sieving primes less than the current maximum and qs
is the offset of the smallest multiple of the corresponding ps
in the current segment. The advance
function clears the bitarray, resets qs
, extends ps
and qs
with new sieving primes, then sieves the next segment.
function genPrimes()
bottom, i, delta := 0, 1, 50000
m, ps, qs, sieve := init(delta)
yield 2
while True
if i == delta # reset for next segment
i, bottom := -1, bottom + 2 * delta
m, ps, qs, sieve := \textbackslash
advance(m, ps, qs, sieve, bottom, delta)
else if sieve[i] # found prime
yield bottom + 2*i + 1
i := i + 1
The segment size 2 * delta
is arbitrarily set to 100000. This method requires O(sqrt(n)) space for the sieving primes plus constant space for the sieve.
It is slower but saves space to generate candidates with a wheel and test the candidates for primality.
function genPrimes()
w, wheel := 0, [1,2,2,4,2,4,2,4,6,2,6,4,2,4, \
6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6, \
2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
p := 2; yield p
repeat
p := p + wheel[w]
if w == 51 then w := 4 else w := w + 1
if isPrime(p) yield p
It may be useful to begin with a sieve and switch to a wheel when the sieve grows too large. Even better is to continue sieving with some fixed set of sieving primes, once the set grows too large, then report only those values bottom + 2*i + 1
that pass a primality test.