clojuremonger

which is faster? map or reduce with condition fn or get-in?


I am using monger and fetching a batch from my mongo nosql database using find-maps. It returns an array that I plan to use as a datastore argument (reference) downstream in my chain of function calls. Within those future function calls, I will have access to a corresponding id. I hope to use this id as a lookup to fetch within my datastore so I don't have to make another monger call. A datastore in the form of an array doesn't seem like the fastest way to get access to the object by id .... but I am not certain.

If I needed to derive an object from this datastore array, then I'd need to use a function like this (that has to log(n) over every element)

(defn fetchObjFromArray [fetch_id inputarray]

    (reduce (fn [reduced_obj element_obj]

                (if (= fetch_id (get-in element_obj [:_id]))
                    element_obj ;; ignoring duplicates for conversation
                    reduced_obj 
                )    
            )
            {}
            inputarray
    )
)

Instead, if after my initial monger call, I create a key/val hash object with a function like this:

(defn createReportFromObjArray [inputarray]

    (reduce (fn [returnobj elementobj]

                (let [_id (get-in elementobj [:_id])
                      keyword (keyword _id)]

                    (assoc returnobj keyword elementobj)
                ) ;; ignoring duplicates for conversation
            )
            {}
            inputarray)
)

then perhaps my subsequent calls could instead use get-in and that would be much faster because I would be fetching by key?

I am confused because: when I use get-in, doesn't it have to iterate over each key in the key/val has object until it finds a match between the key and the fetch_id:

(let [report (createReportFromObjArray inputarray)
      target_val (get-in report [(keyword fetch_id)])]

Why doesn't get-in have to log(n) over every key? Maybe its faster because it can stop when it finds the first "match" where map/reducing has to go the whole way through log(n)? How is this faster than having to iterate over each element in an array and checkin whether id matches fetch_id?

I am very grateful for help you can offer.


Solution

  • In your second code example you are building a Clojure hash map in linear time. Via get and derivations they have lookup performance of O(log32(N)).

    In the first example you scan the entire input and return the last element that matched the ID or the empty hash map, probably unintentionally.

    _

    I recommend to use (group-by :_id) instead of the second code example. I also recommend to use (first (filter (comp #{fetch_id} :_id) inputarray)) in place of the first example.

    Avoid casting to keywords via keyword - Clojure keywords should generally be known at compile time. Maps support arbirtrary data types as keys.