clojuremonger

Clojure: Aggregate and Count in Maps


I guess this question qualifies as an entry-level clojure problem. I basically have troubles processing a clojure map multiple times and extract different kinds of data.

Given a map like this, I'm trying to count entries based on multiple nested keys:

[
  {
    "a": "X",
    "b": "M",
    "c": 188
  },
  {
    "a": "Y",
    "b": "M",
    "c": 165
  },
  {
    "a": "Y",
    "b": "M",
    "c": 313
  },
  {
    "a": "Y",
    "b": "P",
    "c": 188
  }
]

First, I want to group the entries by the a-key values:

{
  "X" : [
    {
      "b": "M",
      "c": 188
    }
  ],
  "Y" : [
    {
      "b": "M",
      "c": 165
    },
    {
      "b": "M",
      "c": 313
    },
    {
      "b": "P",
      "c": 188
    }
  ]
}

Second, I want to assume values of b-keys as duplicates and ignore the remaining keys:

{
  "X" : [
    {
      "b": "M"
    }
  ],
  "Y" : [
    {
      "b": "M"
    }
    {
      "b": "P"
    }
  ]
}

Then, simply count all instances of the b-key:

{
  "X" : 1,
  "Y" : 2
}

As I'm getting the data through monger, I defined:

(defn db-query
  ([coll-name]
     (with-open [conn (mg/connect)]
       (doall (mc/find-maps (mg/get-db conn db-name) coll-name))))

and then hitting the roadblock:

(defn get-sums [request]
  (->> (db-query "data")
       (group-by :a)
       (into {})
        keys))

How could I continue from here?


Solution

  • This is a naive approach, I am sure there are better ways but it might be what you need to figure it out.

    (into {}
      (map       
    
        ; f       
        (fn [ [k vs] ] ;[k `unique count`]
          [k (count (into #{} (map #(get % "b") vs)))]) 
    
        ; coll
        (group-by #(get % "a") DATA))) ; "a"s as keys
    ;user=> {"X" 1, "Y" 2}
    

    Explanation:

    ; I am using your literal data as DATA, just removed the , and ;
    (def DATA [{...
    
    (group-by #(get % "a") DATA) ; groups by "a" as keys
    ; so I get a map {"X":[{},...] "Y":[{},{},{},...]}
    
    ; then I map over each [k v] pair where
    ; k is the map key and
    ; vs are the grouped maps in a vector
    (fn [ [k vs] ] 
          ; here `k` is e.g. "Y" and `vs` are the maps {a _, b, _, c _}
    
          ; now `(map #(get % "b") vs)` gets me all the b values
          ; `into set` makes them uniqe
          ; `count` counts them
          ; finally I return a vector with the same name `k`,
          ;   but the value is the counted `b`s
          [k (count (into #{} (map #(get % "b") vs)))]) 
    
    ; at the end I just put the result `[ ["Y" 2] ["X" 1] ]` `into` a map {}
    ; so you get a map