With new clojure 1.7 I decided to understand where I can use transducers. I understand what benefit they can give, but I can't find normal examples of writing custom transducers with explanation.
Ok, I tried to test what is happening. I opened the clojure documentation. And there examples use xf
as argument. First: what means this xf or xfrom?
This stuff produced identity transducer.
(defn my-identity [xf]
(fn
([]
(println "Arity 0.")
(xf))
([result]
(println "Arity 1: " result " = " (xf result))
(xf result))
([result input]
(println "Arity 2: " result input " = " (xf result input))
(xf result input))))
I took the naming of variables [result input]
from documentation example.
I thought it's like in reduce function where result
is reduced part and input
is new collection element.
So when I make (transduce my-identity + (range 5))
I got result 10
what I was expecting.
Then I read about eduction
, but I can't understand what is it. Anyway
I made (eduction my-identity (range 5))
and got:
Arity 2: nil 0 = nil
Arity 2: nil 1 = nil
Arity 1: nil = nil
(0 0 1 1)
Every item got duplicated because I call xf
in println
statement.
Why it duplicated every item twice?
Why I got nil?
Will I always get nil while making an eduction?
Can I relay on this behavior?
Anyway I did
> (reduce + (eduction my-identity (range 5))
clojure.core.Eduction cannot be cast to clojure.lang.IReduce
Ok, the result is a Eduction
that is NOT reducible, but printed like a list. Why it is not reducible? When I type (doc eduction)
I get that
Returns a reducible/iterable application of the transducers
to the items in coll.
Shouldn't (transduce xform f coll)
and (reduce f (eduction xfrom coll))
be the same?
I made
> (reduce + (sequence my-identity (range 5))
20
Of course I got 20
because of duplicates. Again I thought it should be
that (transduce xform f coll)
and (reduce f (sequence xfrom coll))
be
always equal at least in such small example without any stateful transducers. This is stupid that they are not, or I'm wrong?
Ok, then I tried (type (sequence my-identity (range 5)))
and get
clojure.lang.LazySeq
I thought, that it's lazy but when I tried to take the first
element
clojure calculated all the sequence at once.
So my summary:
1) What means xf or xform?
2) Why I get nil
as a result
argument while eduction
or sequence
?
3) Could I always be sure that it will be nil
while eduction
or sequence
?
4) What is eduction
and what is the idiomatic idea it's not reducible? Or if it is, then how I can reduce it?
5) Why I get side effects while sequence
or eduction
?
6) Can I create actual lazy sequences with transducers?
Many questions, let's first start with a few anwers:
Yes, xf
==xform
is a "transducer".
Your my-identity
function does not compile. You have a parameter and then
multiple other arities of the function. I believe you forgot a (fn ...)
.
Your argument to your identity transducer is called xf
. However, this is
usually called rf
, which means "reducing function". Now the confusing part is
that xf
's are also reducing functions (hence comp
just works). However,
it's confusing that you'd call it xf
and you should call it rf
.
Transducers are usually "constructed" since they may be stateful and/or are
passed parameters. In your case, you don't need to construct it since it's
simple and doesn't have state or even a parameter. However be aware that you'd
usually wrap your function in another fn
returning function. This means you'd
have to call (my-identity)
instead of just passing it as my-identity
.
Again, it's fine here, just slightly unconvential and possibly confusing.
Let's first continue and pretend that your my-identity
transducer is
correct (it's not, and I'll explain later what's going on).
eduction
is relatively rarely used. It creates a "process".
I.e. you can run it over and over again and see the result. Basically, just
like you have lists or vectors that hold your items the eduction will "hold"
the result of the transducer applied. Note that to actually do anything you
still need a rf
(reducing function).
In the beginning I think it is helpful to think of reducing functions
as conj
(or actually conj!
) or in your case +
.
Your eduction
prints the elements it produces since it implements Iterable
which is called by the println
or your REPL. It simply prints out every
element that you add in you transducer with the arity 2 call.
Your call to (reduce + (eduction my-identity (range 5)))
doesn't work
since Eduction
(the object being constructed in eduction
) only implements
IReduceInit
. IReduceInit
as its name suggest does require an initial
value. So this will work: (reduce + 0 (eduction my-identity (range 5)))
Now if you run the above reduce
as I suggest you'll see something very
interesting. It prints 10. Even though your eduction earlier printed (0 0 1 1 2 2 3 3 4 4)
(which if you add together is 20). What's going on here?
As stated earlier, your transducer has a flaw. It doesn't work properly. The
problem is that you call your rf
and then call it a second time again in your
arity 2 function. In clojure, stuff isn't mutable, unless it's somehow
internally mutable for optimization purposes :).
Here the problem is that sometimes clojure uses mutation and you get duplicates
even though you never properly capture the result of your first time you call
(rf)
in your arity 2 function (as the argument to your println
).
Let's fix you function but leave the second rf
call in there:
(defn my-identity2 [rf]
(fn
([]
(println "Arity 0.")
(rf))
([result]
{:post [(do (println "Arity 1 " %) true)]
:pre [(do (println "Arity 1 " result) true)]}
(rf result))
([result input]
{:post [(do (println "Arity 2 " %) true)]
:pre [(do (println "Arity 2 " result input) true)]}
(rf (rf result input) input))))
Note:
xf
to rf
as noted earier.rf
and pass it on to the
second call of rf
. This transducer is not an identity transducer but
doubles every elementObserve carefully:
(transduce my-identity + (range 5));; => 10
(transduce my-identity2 + (range 5));; => 20
(count (into '() my-identity (range 200)));; => 200
(count (into [] my-identity (range 200)));; => 400
(count (into '() my-identity2 (range 200)));; => 400
(count (into [] my-identity2 (range 200)));; => 400
(eduction my-identity (range 5));;=> (0 0 1 1 2 2 3 3 4 4)
(eduction my-identity2 (range 5));;=> (0 0 1 1 2 2 3 3 4 4)
(into '() my-identity (range 5));;=> (4 3 2 1 0)
(into [] my-identity (range 5));;=> [0 0 1 1 2 2 3 3 4 4]
(into '() my-identity2 (range 5));;=> (4 4 3 3 2 2 1 1 0 0)
(reduce + 0 (eduction my-identity (range 5)));;=> 10
(reduce + (sequence my-identity (range 5)));;=> 20
(reduce + 0 (eduction my-identity2 (range 5)));;=> 20
(reduce + (sequence my-identity2 (range 5)));;=> 20
To aswer your questions:
eduction
doesn't really pass nil
as the result
argument when it's being
reduced. It only gets nil when being printed which calls the Iterable
interface.nil
really comes from TransformerIterator
which is a special class
created for transducers. This class is also used for sequence
as you noticed.
As the docs state:The resulting sequence elements are incrementally computed. These sequences will consume input incrementally as needed and fully realize intermediate operations. This behavior differs from the equivalent operations on lazy sequences.
The reason that you receive nil
as the result
argument is because an iterator has no resulting collection which holds the elements iterated over so far. It simply goes over each element. No state is being accumulated.
You can see the reducing function that is used by the TransformerIterator
as and inner class here:
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/TransformerIterator.java
Do a CTRL+f
and enter xf.invoke
to see how your transducer is getting called.
The sequence
function isn't really as lazy as a truly lazy sequence but I
think this explains this part of you question:
Are Clojure transducers eager?
sequence
simply computes the results of a transducer incrementally. Nothing
else.
Lastly, a proper identity function with some debug statements:
(defn my-identity-prop [rf]
(fn
([]
(println "Arity 0.")
(rf))
([result]
(let [r (rf result)]
(println "my-identity(" result ") =" r)
r))
([result input]
(let [r (rf result input)]
(println "my-idenity(" result "," input ") =" r)
r))))