I’m using a MongoDB database to keep track of analytics for an application. I’m writing a Clojure application (using clj-time and Monger) to get data out of the database.
I have a collection containing records like
{"_id": ObjectId(...),
timestamp: ISODate("2013-06-01T15:18:37Z"),
device: "04dbf04b6dc0d0a4fd383967b3dc62f50111e07e"}
Each different device
represents a different user of my service. What I’d like to do is to find out how many (unique) users I have each day, but with the caveat that I’d like “day” to refer specifically to the US/Central time zone, taking daylight saving into account. (If that weren’t a requirement I think I could just do something like a $group
and then a distinct
.)
Here’s what I’ve been doing:
(ns analytics.reporting
(:use [monger.core :only [connect! connect set-db! get-db]]
monger.operators
clj-time.core
clj-time.periodic
clj-time.format)
(:require [monger.collection :as mc]))
(defn to-central
[dt]
(from-time-zone dt (time-zone-for-id "America/Chicago")))
(defn count-distinct
[coll]
(count (distinct coll)))
(defn daily-usage
[ndays]
(let [midnights (map to-central
(reverse (for [offset (map days (range ndays))]
(minus (to-central (today-at 0 0)) offset))))
by-day (for [midnight midnights]
(mc/find-maps "devices" {:timestamp {$gte midnight $lt (plus midnight (days 1))}}))
devices-by-day (map #(map :device %) by-day)
distinct-devices-by-day (map count-distinct devices-by-day)]
distinct-devices-by-day))
If you can’t read Clojure, this basically says: get a list of the most recent n midnights in the Central time zone, and then run Mongo queries to find all of the records between each successive pair of midnights. Then, count the number of distinct device
s within each day.
Here’s what I don’t like about this approach:
device
s should also be done by the database.Is there a way I can shove more of this logic into the MongoDB query?
As suggested by @WiredPrairie, I ended up just including the Central-time date in each record as I added it to the database. Then I was able to use a trivial $group
query to gather the number of records for each date.