clojuredatascript

Slow Datascript query


I'm using Datascript to query a tree structure for the last common ancestor of 2 nodes having given names, here's what I've got so far, but it's really slow -- any idea why (or is there a better way)?

(defn lca
  "Last common ancestor"
  [db name1 name2]
  (d/q '[
          :find [(pull ?anc [:db/id :name]) ...]
          :in    $ % ?name1 ?name2
          :where
            (?node1 :name ?name1)
            (?node2 :name ?name2)
            (anc ?anc1 ?node1)
            (anc ?anc2 ?node2)
            [(not= ?anc1 ?anc2)]
            (parent ?anc ?anc1)
            (parent ?anc ?anc2)
          ]
          @db
          '[
            [ (parent ?par ?child)
              (?par :children ?child)]
            [ (anc ?par ?child)
              (?par :children ?child)]
            [ (anc ?anc ?child)
              (?par :children ?child)
              (anc ?anc ?par)]
            ]
          name1
          name2))

I initially was going to use not to exclude all higher ancestors than the last common one, but Datascript currently doesn't support not hence the two parent clauses.

The schema is:

:children {:db/valueType :db.type/ref 
           :db/cardinality :db.cardinality/many 
           :db/index true}
:name {:db/index true}

Solution

  • Well, recursive rules are not the fastest thing in DataScript. So you can probably make your query a little bit faster by inlining parent rule directly into query code.

    Another thing is that queries are not the fastest thing is DataScript as well. There’s fair amount of time spent parsing the query, allocating intermediate collections, iterating over them, managing variables, etc. There’re two situations in which you can prefer query over manual database/indexes access:

    1. Query works faster than you’d write yourself (e.g. when working with large relations, query will utilize hash joins, a thing that is very tedious to write manually)
    2. Query express your problem in a much simpler way than imperative algorithm

    In your case, neither of these is true (you don’t really work with relations, you walk the graph linearly). Also, there’s a bug: your query won’t work if node1 and node2 have common direct parent.

    What I recommend is to do the same thing by accessing entities directly. Entities are just index lookups without any overhead associated with queries, so in such a simple case they should work much faster.

    Something like this should suffice:

    (defn parent [node]
      (first (:_children node)))
    
    
    (defn ancestors [node]
      (->> node
           (iterate parent)
           (take-while some?)
           reverse))
    
    
    (defn last-common-ancestor [db name1 name2]
      (let [node1 (d/entity db [:name name1])
            node2 (d/entity db [:name name2])]
             ;; zipping ancestor chains together
        (->> (map vector (ancestors node1) (ancestors node2))
             ;; selecting common prefix
             (take-while (fn [[ac1 ac2]] (= ac1 ac2)))
             ;; last item in common prefix is what you looking for
             (last))))