Given a graph database I have an actors vetices collection that has edges connecting it to a scenes vertices collection.
I'm having trouble creating a query where I can select all the disjoint subgraphs, that is: given subgraphs A and B in the database, there's no edge (OUTBOUND or INBOUND) connecting sub-graph A to sub-graph B.
From this AQL:
FOR actor IN actors
FOR v, e, p IN 1..1 ANY actor._id acts_in_scenes
RETURN e
I can get the following graph result as in the uploaded image, the result that I want is a list of lists containing all of the vertexes inside disjoint subgraphs (both actors and scenes). Examples were circled in red.
I've made several attempts using subqueries and collects, I think the best result up to this point is the query bellow, but still it's just returning me scenes:
FOR actor IN actors
LET sub_result = (FOR v, e, p IN 1..1 ANY actor._id acts_in_scenes RETURN DISTINCT v._id) // this just returns me scenes
FILTER LENGTH(sub_result) > 0
RETURN DISTINCT sub_result
Does anyone know if this is possible to solve with an AQL query?
EDIT: So I've increased the depth to 5 (1..5) in the graph traversal subquery and now I can get actors vertices. Now the problem is when checking the result json, I can see repeated scene keys on the groups, this should not be possible if this result would represent the disjoint sub-graphs:
FOR actor IN actors
LET sub_result = (
FOR v, e, p IN 1..5 ANY actor._id acts_in_scenes
SORT v._id
RETURN DISTINCT v._id
)
FILTER LENGTH(sub_result) > 0
SORT COUNT(sub_result) DESC
RETURN DISTINCT { 'count': COUNT(sub_result), result: sub_result }
EDIT 2:
I had to solve this by creating a graph on the application side using networkx library and using the nx.connected_components() function. But I really wish I could solve this by using just the graph features of the database, since it adds complexity to the application and mandates me to create a graph in memory on the app side.
In ArangoDB v3.7, a new Pregel algorithm wcc
for finding Weakly Connected Components was added:
https://www.arangodb.com/docs/stable/release-notes-new-features37.html#pregel
It allows you to compute the subgraphs upfront. In arangosh:
var pregel = require("@arangodb/pregel");
var params = {
"maxGSS": db.actors.count(), /* the number of vertices */
"resultField": "subgraph"
};
var id = pregel.start("wcc", {
vertexCollections:["actors"],
edgeCollections:["acts_in_scenes"]
}, params);
When it's done (pregel.status(id).state
equals "done"
), each actor document will have a numeric attribute "subgraph"
that is the same for all vertices of the same subgraph.