I want to map a large number of tuples. My map looks something like:
{[1 2] :thing}
Except there may be a few million of them. I have a feeling that a tree-map might be a good thing to test so I'm trying to get it working. I can't seem to get the comparison function right though.
(defn compare
[[x y] [xx yy]]
(cond
(and (= x xx) (= y yy)) 0
(and (<= x xx) (<= y yy)) -1
(and (<= x xx) (> y yy)) -1
(and (> x xx) (<= y yy)) 1
(and (> x xx) (> y yy)) 1))
Some trivial inputs seem to work
user=> (compare [1 1] [1 1])
0
user=> (compare [1 1] [2 2])
-1
user=> (compare [1 2] [2 1])
-1
user=> (compare [2 1] [1 2])
1
But if I create inputs that cover all combinations, the map should consider them all different.
(def inputs
"All tuples of [0-4, 5-10]."
(clojure.math.combinatorics/cartesian-product
(range 0 4)
(range 5 10)))
(def input-pairs
"All possible pairs of tuples"
(clojure.math.combinatorics/cartesian-product inputs inputs))
If I test the comparison function, it returns zero only when the two vectors are structurally identical.
user=> (doseq [[a b] input-pairs]
#_=> (when (zero? (compare a b)) (prn a b)))
(0 5) (0 5)
(0 6) (0 6)
(0 7) (0 7)
(0 8) (0 8)
(0 9) (0 9)
(1 5) (1 5)
etc
So I think my compare function is correct. Using it in the treemap, however, gives some strange results:
(def inputs-kvs
"Inputs in the format that the hash-map and sorted-map constructor understand"
(mapcat #(vector % (apply str %))
(clojure.math.combinatorics/cartesian-product
(range 0 4)
(range 5 10))))
Putting these in a hashmap gives the correct answer
(count (apply assoc (hash-map) inputs-kvs))
=> 20
But putting them in the treemap with the given comparison:
(def structure (sorted-map-by compare))
(count (apply assoc structure inputs-kvs))
=> 4
(apply assoc structure inputs-kvs)
=> {(0 5) "25", (1 6) "36", (2 7) "37", (3 5) "39"}
"25" has been stored in the (0 5)
slot. But the compare function doesn't say that (0 5)
and (2 5)
are the same:
=> (compare [0 5] [2 5])
-1
What am I doing wrong? Can I make this work? Is it even meaningful to project a 2-dimensional space onto a 1-dimensional one?
(To head off a question you may have, yes I've tried a 2-dimensional structure, e.g. (sorted-map 1 (sorted-map 2 :value))
, but I'm trying to find alternatives with better performance)
Clojure already comes with it's own compare
:
user=> (doc compare)
-------------------------
clojure.core/compare
([x y])
Comparator. Returns a negative number, zero, or a positive number
when x is logically 'less than', 'equal to', or 'greater than'
y. Same as Java x.compareTo(y) except it also works for nil, and
compares numbers and collections in a type-independent manner. x
must implement Comparable
Which behaves the same as OPs own function, but most likely is more efficient:
user=> (compare [1 1] [1 1])
0
user=> (compare [1 1] [2 2])
-1
user=> (compare [2 1] [1 2])
1
The behaviour is documented in the Section about Vectors (IPersistentVector) in the Data Structures docs:
Vectors are compared first by length, then each element is compared in order.
So you can just use sorted-map-by compare
from core, or since that's the default anyway just sorted-map
for your data structure:
user=> (def m (into {} (let [r #(- (rand-int 10) (rand-int 10))] (for [a (range -1 2) b (range -1 2)] [[(r) (r)] (str a b)]))))
#'user/m
user=> (>pprint m)
{[-7 -4] "10",
[-3 5] "01",
[-5 -7] "00",
[5 2] "11",
[-3 1] "-10",
[7 -4] "-11",
[0 -6] "0-1",
[3 1] "-1-1",
[-8 -1] "1-1"}
nil
user=> (>pprint (into (sorted-map-by compare) m))
{[-8 -1] "1-1",
[-7 -4] "10",
[-5 -7] "00",
[-3 1] "-10",
[-3 5] "01",
[0 -6] "0-1",
[3 1] "-1-1",
[5 2] "11",
[7 -4] "-11"}
nil
user=> (>pprint (into (sorted-map) m))
{[-8 -1] "1-1",
[-7 -4] "10",
[-5 -7] "00",
[-3 1] "-10",
[-3 5] "01",
[0 -6] "0-1",
[3 1] "-1-1",
[5 2] "11",
[7 -4] "-11"}
nil
user=> (assert (= (into (sorted-map-by compare) m) (into (sorted-map) m)))
nil