clojuretransducer

Clojure transducers behavior


With new clojure 1.7 I decided to understand where I can use transducers. I understand what benefit they can give, but I can't find normal examples of writing custom transducers with explanation.

Ok, I tried to test what is happening. I opened the clojure documentation. And there examples use xf as argument. First: what means this xf or xfrom? This stuff produced identity transducer.

(defn my-identity [xf]
  (fn 
   ([]
     (println "Arity 0.")
     (xf))
   ([result]
     (println "Arity 1: " result " = " (xf result)) 
     (xf result))
   ([result input]
     (println "Arity 2: " result input " = " (xf result input)) 
     (xf result input))))

I took the naming of variables [result input] from documentation example. I thought it's like in reduce function where result is reduced part and input is new collection element.

So when I make (transduce my-identity + (range 5)) I got result 10 what I was expecting. Then I read about eduction, but I can't understand what is it. Anyway I made (eduction my-identity (range 5)) and got:

Arity 2:  nil 0  =  nil
Arity 2:  nil 1  =  nil
Arity 1: nil  =  nil
(0 0 1 1)

Every item got duplicated because I call xf in println statement. Why it duplicated every item twice? Why I got nil? Will I always get nil while making an eduction? Can I relay on this behavior?

Anyway I did

> (reduce + (eduction my-identity (range 5))
clojure.core.Eduction cannot be cast to clojure.lang.IReduce

Ok, the result is a Eduction that is NOT reducible, but printed like a list. Why it is not reducible? When I type (doc eduction) I get that

Returns a reducible/iterable application of the transducers
to the items in coll.

Shouldn't (transduce xform f coll) and (reduce f (eduction xfrom coll)) be the same?

I made

> (reduce + (sequence my-identity (range 5))
20

Of course I got 20 because of duplicates. Again I thought it should be that (transduce xform f coll) and (reduce f (sequence xfrom coll)) be always equal at least in such small example without any stateful transducers. This is stupid that they are not, or I'm wrong?

Ok, then I tried (type (sequence my-identity (range 5))) and get clojure.lang.LazySeq I thought, that it's lazy but when I tried to take the first element clojure calculated all the sequence at once.

So my summary:

1) What means xf or xform?

2) Why I get nil as a result argument while eduction or sequence?

3) Could I always be sure that it will be nil while eduction or sequence?

4) What is eduction and what is the idiomatic idea it's not reducible? Or if it is, then how I can reduce it?

5) Why I get side effects while sequence or eduction?

6) Can I create actual lazy sequences with transducers?


Solution

  • Many questions, let's first start with a few anwers:

    1. Yes, xf==xform is a "transducer".

    2. Your my-identity function does not compile. You have a parameter and then multiple other arities of the function. I believe you forgot a (fn ...).

    3. Your argument to your identity transducer is called xf. However, this is usually called rf, which means "reducing function". Now the confusing part is that xf's are also reducing functions (hence comp just works). However, it's confusing that you'd call it xf and you should call it rf.

    4. Transducers are usually "constructed" since they may be stateful and/or are passed parameters. In your case, you don't need to construct it since it's simple and doesn't have state or even a parameter. However be aware that you'd usually wrap your function in another fn returning function. This means you'd have to call (my-identity) instead of just passing it as my-identity. Again, it's fine here, just slightly unconvential and possibly confusing.

    5. Let's first continue and pretend that your my-identity transducer is correct (it's not, and I'll explain later what's going on).

    6. eduction is relatively rarely used. It creates a "process". I.e. you can run it over and over again and see the result. Basically, just like you have lists or vectors that hold your items the eduction will "hold" the result of the transducer applied. Note that to actually do anything you still need a rf (reducing function).

    7. In the beginning I think it is helpful to think of reducing functions as conj (or actually conj!) or in your case +.

    8. Your eduction prints the elements it produces since it implements Iterable which is called by the println or your REPL. It simply prints out every element that you add in you transducer with the arity 2 call.

    9. Your call to (reduce + (eduction my-identity (range 5))) doesn't work since Eduction (the object being constructed in eduction) only implements IReduceInit. IReduceInit as its name suggest does require an initial value. So this will work: (reduce + 0 (eduction my-identity (range 5)))

    10. Now if you run the above reduce as I suggest you'll see something very interesting. It prints 10. Even though your eduction earlier printed (0 0 1 1 2 2 3 3 4 4) (which if you add together is 20). What's going on here?

    11. As stated earlier, your transducer has a flaw. It doesn't work properly. The problem is that you call your rf and then call it a second time again in your arity 2 function. In clojure, stuff isn't mutable, unless it's somehow internally mutable for optimization purposes :). Here the problem is that sometimes clojure uses mutation and you get duplicates even though you never properly capture the result of your first time you call (rf) in your arity 2 function (as the argument to your println).

    Let's fix you function but leave the second rf call in there:

      (defn my-identity2 [rf]
        (fn
          ([]
           (println "Arity 0.")
           (rf))
          ([result]
           {:post [(do (println "Arity 1 " %) true)]
            :pre  [(do (println "Arity 1 " result) true)]}
           (rf result))
          ([result input]
           {:post [(do (println "Arity 2 " %) true)]
            :pre  [(do (println "Arity 2 " result input) true)]}
           (rf (rf result input) input))))
    

    Note:

    Observe carefully:

    (transduce my-identity + (range 5));; => 10
    (transduce my-identity2 + (range 5));; => 20
    
    (count (into '() my-identity (range 200)));; => 200
    (count (into  [] my-identity (range 200)));; => 400
    
    (count (into '() my-identity2 (range 200)));; => 400
    (count (into  [] my-identity2 (range 200)));; => 400
    
    (eduction my-identity  (range 5));;=> (0 0 1 1 2 2 3 3 4 4)
    (eduction my-identity2 (range 5));;=> (0 0 1 1 2 2 3 3 4 4)
    
    (into '() my-identity  (range 5));;=> (4 3 2 1 0)
    (into  [] my-identity  (range 5));;=> [0 0 1 1 2 2 3 3 4 4]
    (into '() my-identity2 (range 5));;=> (4 4 3 3 2 2 1 1 0 0)
    
    
    (reduce + 0 (eduction my-identity (range 5)));;=> 10
    (reduce +   (sequence my-identity (range 5)));;=> 20
    
    (reduce + 0 (eduction my-identity2 (range 5)));;=> 20
    (reduce +   (sequence my-identity2 (range 5)));;=> 20
    

    To aswer your questions:

    1. eduction doesn't really pass nil as the result argument when it's being reduced. It only gets nil when being printed which calls the Iterable interface.
    2. The nil really comes from TransformerIterator which is a special class created for transducers. This class is also used for sequence as you noticed. As the docs state:

    The resulting sequence elements are incrementally computed. These sequences will consume input incrementally as needed and fully realize intermediate operations. This behavior differs from the equivalent operations on lazy sequences.

    The reason that you receive nil as the result argument is because an iterator has no resulting collection which holds the elements iterated over so far. It simply goes over each element. No state is being accumulated.

    You can see the reducing function that is used by the TransformerIterator as and inner class here:

    https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/TransformerIterator.java

    Do a CTRL+f and enter xf.invoke to see how your transducer is getting called.

    The sequence function isn't really as lazy as a truly lazy sequence but I think this explains this part of you question:

    Are Clojure transducers eager?

    sequence simply computes the results of a transducer incrementally. Nothing else.

    Lastly, a proper identity function with some debug statements:

    (defn my-identity-prop [rf]
      (fn
        ([]
         (println "Arity 0.")
         (rf))
        ([result]
         (let [r (rf result)]
           (println "my-identity(" result ") =" r)
           r))
        ([result input]
         (let [r (rf result input)]
           (println "my-idenity(" result "," input ") =" r)
           r))))