data-structuresclojureclojure-core

How do I use a treemap for a 2 dimensional keys?


I want to map a large number of tuples. My map looks something like:

{[1 2] :thing}

Except there may be a few million of them. I have a feeling that a tree-map might be a good thing to test so I'm trying to get it working. I can't seem to get the comparison function right though.

(defn compare 
  [[x y] [xx yy]]
  (cond
   (and (= x xx) (= y yy)) 0
   (and (<= x xx) (<= y yy)) -1
   (and (<= x xx) (> y yy)) -1
   (and (> x xx) (<= y yy)) 1
   (and (> x xx) (> y yy)) 1))

Some trivial inputs seem to work

user=> (compare [1 1] [1 1])
0
user=> (compare [1 1] [2 2])
-1
user=> (compare [1 2] [2 1])
-1
user=> (compare [2 1] [1 2])
1

But if I create inputs that cover all combinations, the map should consider them all different.

(def inputs
    "All tuples of [0-4, 5-10]."
    (clojure.math.combinatorics/cartesian-product
      (range 0 4) 
      (range 5 10)))

(def input-pairs
     "All possible pairs of tuples"
     (clojure.math.combinatorics/cartesian-product inputs inputs))

If I test the comparison function, it returns zero only when the two vectors are structurally identical.

user=> (doseq [[a b] input-pairs]
  #_=>   (when (zero? (compare a b)) (prn a b)))
(0 5) (0 5)
(0 6) (0 6)
(0 7) (0 7)
(0 8) (0 8)
(0 9) (0 9)
(1 5) (1 5)
etc

So I think my compare function is correct. Using it in the treemap, however, gives some strange results:

(def inputs-kvs
    "Inputs in the format that the hash-map and sorted-map constructor understand"
    (mapcat #(vector % (apply str %))
            (clojure.math.combinatorics/cartesian-product
              (range 0 4) 
              (range 5 10))))

Putting these in a hashmap gives the correct answer

(count (apply assoc (hash-map) inputs-kvs))
=> 20

But putting them in the treemap with the given comparison:

(def structure (sorted-map-by compare))
(count (apply assoc structure inputs-kvs))
=> 4

(apply assoc structure inputs-kvs)
=> {(0 5) "25", (1 6) "36", (2 7) "37", (3 5) "39"}

"25" has been stored in the (0 5) slot. But the compare function doesn't say that (0 5) and (2 5) are the same:

=> (compare [0 5] [2 5])
-1

What am I doing wrong? Can I make this work? Is it even meaningful to project a 2-dimensional space onto a 1-dimensional one?

(To head off a question you may have, yes I've tried a 2-dimensional structure, e.g. (sorted-map 1 (sorted-map 2 :value)), but I'm trying to find alternatives with better performance)


Solution

  • Clojure already comes with it's own compare:

    user=> (doc compare)
    -------------------------
    clojure.core/compare
    ([x y])
      Comparator. Returns a negative number, zero, or a positive number
      when x is logically 'less than', 'equal to', or 'greater than'
      y. Same as Java x.compareTo(y) except it also works for nil, and
      compares numbers and collections in a type-independent manner. x
      must implement Comparable
    

    Which behaves the same as OPs own function, but most likely is more efficient:

    user=> (compare [1 1] [1 1])
    0
    user=> (compare [1 1] [2 2])
    -1
    user=> (compare [2 1] [1 2])
    1
    

    The behaviour is documented in the Section about Vectors (IPersistentVector) in the Data Structures docs:

    Vectors are compared first by length, then each element is compared in order.

    So you can just use sorted-map-by compare from core, or since that's the default anyway just sorted-map for your data structure:

    user=> (def m (into {} (let [r #(- (rand-int 10) (rand-int 10))] (for [a (range -1 2) b (range -1 2)] [[(r) (r)] (str a b)]))))
    #'user/m
    user=> (>pprint m)
    {[-7 -4] "10",
     [-3 5] "01",
     [-5 -7] "00",
     [5 2] "11",
     [-3 1] "-10",
     [7 -4] "-11",
     [0 -6] "0-1",
     [3 1] "-1-1",
     [-8 -1] "1-1"}
    nil
    user=> (>pprint (into (sorted-map-by compare) m))
    {[-8 -1] "1-1",
     [-7 -4] "10",
     [-5 -7] "00",
     [-3 1] "-10",
     [-3 5] "01",
     [0 -6] "0-1",
     [3 1] "-1-1",
     [5 2] "11",
     [7 -4] "-11"}
    nil
    user=> (>pprint (into (sorted-map) m))
    {[-8 -1] "1-1",
     [-7 -4] "10",
     [-5 -7] "00",
     [-3 1] "-10",
     [-3 5] "01",
     [0 -6] "0-1",
     [3 1] "-1-1",
     [5 2] "11",
     [7 -4] "-11"}
    nil
    user=> (assert (= (into (sorted-map-by compare) m) (into (sorted-map) m)))
    nil