haskellprimessieve-of-eratostheneslazy-sequencesspace-leak# double stream feed to prevent unneeded memoization?

I'm new to Haskell and I'm trying to implement Euler's Sieve in stream processing style.

When I checked the Haskell Wiki page about prime numbers, I found some mysterious optimization technique for streams.
In **3.8 Linear merging** of that wiki:

```
primesLME = 2 : ([3,5..] `minus` joinL [[p*p, p*p+2*p..] | p <- primes'])
where
primes' = 3 : ([5,7..] `minus` joinL [[p*p, p*p+2*p..] | p <- primes'])
joinL ((x:xs):t) = x : union xs (joinL t)
```

And it says

“

The double primes feed is introduced here to prevent unneeded memoization and thus prevent memory leak, as per Melissa O'Neill's code.”

How could this be? I can't figure out how it works.

Solution

Normally, definition of primes stream in Richard Bird's formulation of the sieve of Eratosthenes is self-referential:

```
import Data.List.Ordered (minus, union, unionAll)
ps = 2 : minus [3..] -- `:` is "cons"
(foldr (\p r -> p*p : union [p*p+p, p*p+2*p..] r) [] ps)
```

The primes `ps`

produced by this definition are used as the input to it. To prevent a *vicious circle* the definition is primed with the initial value, 2. This corresponds to the mathematical definition of the sieve of Eratosthenes as finding primes in the gaps between the composites, enumerated for each prime *p* by counting up in steps of *p*,

**P** = { *2* } **∪** ( { *3,4,5,*... } \ **⋃**_{p in P} { *p ^{2}*,

The produced stream is used as the input in its own definition. This causes the retention of the whole stream of primes in memory (or most of it anyway). The fixpoint here is *sharing*, *corecursive*:

```
fix f = xs where xs = f xs -- a sharing fixpoint combinator
ps = fix ((2:) . minus [3..] . foldr (...) [])
-- = xs where xs = 2 : minus [3..] (foldr (...) [] xs)
```

* The idea* (due to Melissa O'Neill) is, then, to separate this into

```
fix2 f = f xs where xs = f xs -- double-staged fixpoint combinator
ps2 = fix2 ((2:) . minus [3..] . foldr (...) [])
-- = 2 : minus [3..] (foldr (...) [] xs) where
-- xs = 2 : minus [3..] (foldr (...) [] xs)
```

Thus, when `ps2`

produces some prime `p`

, its inner stream `xs`

of *"core"* primes needs only be instantiated up to about `sqrt p`

, and any primes which are produced by `ps2`

can get discarded and garbage-collected by the system immediately afterwards:

\ \ <- ps2 <-. \ \ <- xs <-. / \ \_________/

Primes produced by the inner loop `xs`

can not be immediately discarded, because they are needed for `xs`

stream itself. When `xs`

has produced a prime `q`

, only its part below `sqrt q`

can be discarded, just after it has been consumed by the `foldr`

part of the computation. In other words this sequence maintains back pointer into itself down to the `sqrt`

of its biggest produced value (as it is being consumed by *its* consumer, like `print`

).

So with one feed loop (with `fix`

) almost the whole sequence would have to be retained in memory, while with the double feed (with `fix2`

) only the inner loop needs to be mostly retained that only reaches up to the square root of the current value produced by the main stream. Thus the overall space complexity is reduced from about *O(N)* to about *O(sqrt(N))* -- a drastic reduction.

For this to work the code must be compiled with optimizations, i.e. with the `-O2`

switch, and run standalone. You may also have to use `-fno-cse`

switch. And there must be only one reference to `ps2`

in the testing code:

```
main = getLine >>= (read >>> (+(-1)) >>> (`drop` ps2) >>> print . take 5)
```

In fact, when tested at Ideone, it does show a practically constant memory consumption.

**And it's the Sieve of Eratosthenes, not Euler's sieve.**

The initial definitions are:

```
eratos (x:xs) = x : eratos (minus xs $ map (*x) [x..] ) -- ps = eratos [2..]
eulers (x:xs) = x : eulers (minus xs $ map (*x) (x:xs)) -- ps = eulers [2..]
```

Both are very inefficient because of the premature handling of the multiples. It is easy to remedy the *first* definition by fusing the `map`

and the enumeration into one enumeration moved further away (from `x`

to `x*x`

, i.e. `[x*x, x*x+x..]`

), so that its handling can be * postponed* -- because here each prime's multiples are generated

```
eratos (p:ps) xs | (h,t) <- span (< p*p) xs = -- ps = 2 : eratos ps [2..]
h ++ eratos ps (minus t [p*p, p*p+p..]) -- "postponed sieve"
```

which is the same as Bird's sieve at the top of this post, *segment-wise*:

```
ps = 2 : [n | (r:q:_, px) <- (zip . tails . (2:) . map (^2) <*> inits) ps,
n <- [r+1..q-1] `minus` foldr union []
[[s+p, s+2*p..q-1] | p <- px, let s = r`div`p*p]]
```

(`(f <*> g) x = f x (g x)`

is used here as a pointfree shorthand.)

There is no easy fix for the second definition, i.e. `eulers`

.

*addition:* you can see the same idea implemented with Python generators, for comparison, here.

In fact, that Python code employs a *telescopic, multistage* recursive production of ephemeral primes streams; in Haskell we can arrange for it with the non-sharing, multi-staged fixpoint combinator ** _Y**:

```
primes = 2 : _Y ((3:) . sieve 5 . unionAll . map (\p -> [p*p, p*p+2*p..]))
where
_Y g = g (_Y g) -- == g . g . g . g . ....
sieve k s@(x:xs) | k < x = k : sieve (k+2) s -- == [k,k+2..] \\ s,
| True = sieve (k+2) xs -- when s ⊂ [k,k+2..]
```

- How can I refactor from error to an ExceptT?
- How can I generate a random sequence of elements from a list in Haskell?
- Problem with quantified types and pattern matching
- Why is the `unicode-show` package necessary?
- Get sum of int or integer in Haskell
- Java 8: streams and the Sieve of Eratosthenes
- Can iterate be written with a fold?
- Haskell function with data pattern match and second argument gives Equations have different numbers of arguments
- How to use (->) instances of Monad and confusion about (->)
- Trouble understanding Haskell type unification with a nested `fmap`
- why ghc does not support PIE and Full RelRO in linux?
- Controlling Wai logger messages
- How do Haskell compilers decide whether to allocate on the heap or the stack?
- How can date/time format of Yesod logger be configured?
- In Haskell's Turtle library, how to copy a file, but preserve the file date
- Could not deduce (Semigroup (TimedEvents a))
- Instance of class with Data family yielding error "Couldn't match expected type"
- Is there a way to get a Haskell setup on Windows without an installation? (Copy + paste)
- Haskell parse error on input `<-'
- Inline assembly in Haskell
- How do I deal with arbitrary length tuple to build up a complex SQL query for Haskell's postgresql-simple's query function?
- Extract liftIO and runSql in separate function (Haskell)
- How to query NodeJS stream 'meta data'?
- Building multiple executables in the default Haskell Stack project
- Take elements of a list up to and including the first value that satisfies some predicate in Haskell
- How are Hackage package names mapped to 'cabal install' names?
- Insert entity into DB with manually created foreign key in Persistent library (Haskell)
- couldn´t match type 'IO´ with ´[]` inside a do
- Why doesn't contravariance apply in certain cases like [b] → Int < a → Int
- How does the Haskell compiler handle the 'where' statement?