clojurelazy-sequences

Newbie Problem Understanding Clojure Lazy Sequences


I've just started learning Clojure and I'm puzzled by how lazy sequences work. In particular, I don't understand why these 2 expressions produce different results in the repl:

;; infinite range works OK
(user=> (take 3 (map #(/(- % 5)) (range)))
(-1/5 -1/4 -1/3)

;; finite range causes error
user=> (take 3 (map #(/(- % 5)) (range 1000)))
Error printing return value (ArithmeticException) at clojure.lang.Numbers/divide (Numbers.java:188).
Divide by zero

I take the sequence of integers (0 1 2 3 ...) and apply a function that subtracts 5 and then takes the reciprocal. Obviously this causes a division-by-zero error if it's applied to 5. But since I'm only taking the first 3 values from a lazy sequence I wasn't expecting to see an exception.

The results are what I expected when I use all the integers, but I get an error if I use the first 1000 integers.

Why are the results different?


Solution

  • Clojure 1.1 introduced "chunked" sequences,

    This can provide greater efficiency ... Consumption of chunked-seqs as normal seqs should be completely transparent. However, note that some sequence processing will occur up to 32 elements at a time. This could matter to you if you are relying on full laziness to preclude the generation of any non-consumed results. [Section 2.3 of "Changes to Clojure in Version 1.1"]

    In your example (range) seems to be producing a seq that realizes one element at a time and (range 999) is producing a chunked seq. map will consume a chunked seq a chunk at a time, producing a chunked seq. So when take asks for the first element of the chunked seq, function passed to map is called 32 times on the values 0 through 31.

    I believe it is wisest to code in such a way the code will still work for any seq producing function/arity if that function produces a chunked seq with an arbitrarily large chunk.

    I do not know if one writes a seq producing function that is not chunked if one can rely in current and future versions of library functions like map and filter to not convert the seq into a chunked seq.

    But, why the difference? What are the implementation details such that (range) and (range 999) are different in the sort of seq produced?

    1. Range is implemented in clojure.core.
    2. (range) is defined as (iterate inc' 0).
    3. Ultimately iterate's functionality is provided by the Iterate class in Iterate.java.
    4. (range end) is defined, when end is a long, as (clojure.lang.LongRange/create end)
    5. The LongRange class lives in LongRange.java.

    Looking at the two java files it can be seen that the LongRange class implements IChunkedSeq and the Iterator class does not. (Exercise left for the reader.)

    Speculation

    1. The implementation of clojure.lang.Iterator does not chunk because iterator can be given a function of arbitrary complexity and the efficiency from chunking can easily be overwhelmed by computing more values than needed.
    2. The implementation of (range) relies on iterator instead of a custom optimized Java class that does chunking because the (range) case is not believed to be common enough to warrant optimization.