orientdbtraversalarangodb

Filtering a return group by traversal in ArangoDB


I'm in the process of evaluating ArangoDB to be used instead of OrientDB. My dataset is essentially a forest of not-necessarily connected trees (a family tree).

Because the dataset is a directed acyclic graph (a tree), it's always more efficient to walk up the tree looking for something than down the tree.

In earlier versions of OrientDB, before they removed this critical feature for me, I was able to do the following query:

SELECT FROM Person WHERE haircolor = "Red" and in traverse(0, -1, "in") (birth_country = "Ireland")

Since haircolor is an indexed field, it's efficient to get all of those vertices. The magic is in the traverse operator within the WHERE clause, which stops traversal and immediately returns TRUE if it locates any ancestor from Ireland.

Yes, you can turn it around and look for all those from Ireland, and then walk downward looking for those pesky redheads, returning them, but it is substantially less efficient, since you have to evaluate every downward path, which potentially expands exponentially.

Since OrientDB shot themselves in the foot (in my opinion) by taking that feature out, I'm wondering if there's an ArangoDB query that would do a similar task without walking down the tree.

Thanks in advance for your help!


Solution

  • In AQL, it would go something like this:

    FOR redhead IN Person                             // Find start vertices
      FILTER doc.haircolor == "Red"
      FOR v, e, p IN 1..99 INBOUND redhead Ancestor   // Traversal up to depth 99
        PRUNE v.birth_country == "Ireland"            // Don't walk further if condition is met
        RETURN p                                      // Return the entire path
    

    This assumes that the relations (edges) are stored in an edge collection called Ancestor.

    PRUNE prevents further traversal down (or here: up) the path but includes the node that it is at. https://www.arangodb.com/docs/stable/aql/graphs-traversals.html#pruning

    Note that the variable depth traversal returns not only the longest paths but also "intermediate" paths of the same route. You may want to filter them out on the client-side or take a look at this AQL solution at the cost of additional traversals: https://stackoverflow.com/a/64931939/2044940