I tried to use directly Clojure's hashmap with MapDB and ran into weird behaviour. I checked Clojure and MapDB sources and couldn't understand the problem.
First everything looks fine:
lein try org.mapdb/mapdb "1.0.6"
; defining a db for the first time
(import [org.mapdb DB DBMaker])
(defonce db (-> (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
.closeOnJvmShutdown
.compressionEnable
.make))
(defonce fruits (.getTreeMap db "fruits-store"))
(do (.put fruits :banana {:qty 2}) (.commit db))
(get fruits :banana)
=> {:qty 2}
(:qty (get fruits :banana))
=> 2
(first (keys (get fruits :banana)))
=> :qty
(= :qty (first (keys (get fruits :banana))))
=> true
CTRL-D
=> Bye for now!
Then I try to access the data again:
lein try org.mapdb/mapdb "1.0.6"
; loading previsously created db
(import [org.mapdb DB DBMaker])
(defonce db (-> (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
.closeOnJvmShutdown
.compressionEnable
.make))
(defonce fruits (.getTreeMap db "fruits-store"))
(get fruits :banana)
=> {:qty 2}
(:qty (get fruits :banana))
=> nil
(first (keys (get fruits :banana)))
=> :qty
(= :qty (first (keys (get fruits :banana))))
=> false
(class (first (keys (get fruits :banana))))
=> clojure.lang.Keyword
How come the same keyword be different with respect to =
?
Is there some weird reference problem happening ?
The problem is caused by the way equality of keywords works. Looking at the
implementation of the =
function we see that since keywords are not
clojure.lang.Number
or clojure.lang.IPersistentCollection
their equality is
determined in terms of the Object.equals
method. Skimming the source of
clojure.lang.Keyword
we learn that keywords don't not override
Object.equals
and therefore two keywords are equal iff they are the same
object.
The default serializer of MapDB is org.mapdb.SerializerPojo
, a subclass of
org.mapdb.SerializerBase
. In its documentation we can read that
it's a
Serializer which uses ‘header byte’ to serialize/deserialize most of classes from ‘java.lang’ and ‘java.util’ packages.
Unfortunately, it doesn't work that well with clojure.lang
classes; It doesn't
preserve identity of keywords, thus breaking equality.
In order to fix it let's attempt to write our own serializer using the
EDN format—alternatively, you could consider, say, Nippy—and use
it in our MapDB.
(require '[clojure.edn :as edn])
(deftype EDNSeralizer []
;; See docs of org.mapdb.Serializer for semantics.
org.mapdb.Serializer
(fixedSize [_]
-1)
(serialize [_ out obj]
(.writeUTF out (pr-str obj)))
(deserialize [_ in available]
(edn/read-string (.readUTF in)))
;; MapDB expects serializers to be serializable.
java.io.Serializable)
(def edn-serializer (EDNSeralizer.))
(import [org.mapdb DB DBMaker])
(def db (.. (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
closeOnJvmShutdown
compressionEnable
make))
(def more-fruits (.. db
(createTreeMap "more-fruits")
(valueSerializer (EDNSeralizer.))
(makeOrGet)))
(.put more-fruits :banana {:qty 2})
(.commit db)
Once the more-fruits
tree map is reopened in a JVM with EDNSeralizer
defined
the :qty
object stored inside will be the same object as any other :qty
instance. As a result equality checks will work properly.