I have undirected connected graph with one main node (denoted by M
in examples below).
I am trying to find efficient algorithm to following specification:
Example 1:
Expected output: {2: [B,C,D,E,F,G,H], 4: [D,E,F]}
Explanation: for all nodes from B
to the right the path must go through edge 2
. However the nodes A
, B
and C
are not under edge 4
because no path between any of them and M
contains edge 4
.
Example 2:
Expected output: {1: [A,B,C,D,E,F], 2: [A,B,C,D,E,F], 5: [D,E,F]}
Explanation: both of nodes A
and B
must belong to both of edges 1
and 2
because node M
is reachable from each of them (besides the shortest path) also through node C
. Path length does not matter. Like in example 1, none of nodes in the cycle belongs to edge 5
.
Example 3: (UPDATE - counterexample for @user58697's comment)
Expected output: {1: [A,B,C,D,E,F], 5: [D,E,F]}
Explanation: for every node A
..F
there exists path through edge 1
to M
(for example BCAM
, FECAM
). However none of nodes A
, B
, C
can reach M
through edge 5
so they do not belong to it in result map (ACDFECBM
is a trail, not path).
Straightforward solution
The dummy algorithm works correctly (I show pseudocode because I have implemented it in Neo4j Traversal framework actually):
Map<Edge,Set<Node>> execute(Set<Edge> input) {
Map<Edge,Set<Node>> result = new HashMap<>();
Node m = graph.getMainNode();
for (Node n : graph.getAllNodes()) {
for (Path p : depthFirstSearchAllUniquePathsFrom(n, m)) {
for (Edge e : p.getEdges().filter(input::contains).toList()) {
result.computeIfAbsent(e, new HashSet<>()).add(n);
}
}
}
return result;
}
The problem
The algorithm is slow. The traversal of graph of 758 nodes and 133 input edges with many cycles takes 20min on my local computer. Debugging proves there is a lot of similar paths (the depthFirstSearchAllUniquePathsFrom
is actually traversal with Uniqueness.NODE_PATH
which is very time consuming).
I seek for efficient algorithm which would better utilize information gained during traversal (for comparison - something like if Floyd-Warshall algorithm finds all paths with better complexity than would multiple calls of Dijkstra algorithm between each pair do).
I actually don't want to enumerate paths (which has exponential of factorial complexity). I cannot distinguish whether I am falling into trap of some NP-hard problem or there is just some simple solution I am overlooking.
Notes
UPDATE
Following the update to the explanation in the question, the following query will return a collection of all distinct nodes for each relationship ID for which a path exists that:
UNWIND $edgeIds AS edgeId
MATCH p = SHORTEST 1 (m {id: "M"})--*()-[r {id: edgeId}]-()--*(s)
WHERE size(nodes(p)) = COUNT { UNWIND nodes(p) AS n RETURN DISTINCT n }
RETURN edgeId, collect(s.id) AS nodes
To show it works, I made the values "M", 2 etc values of property id
of the nodes and relationships for my examples.
Here is the data and results for the first example:
CREATE ({id: "M"})-[:R]->({id: "A"})-[:R {id: 2}]->({id: "B"})-[:R {id: 3}]->
(n15 {id: "C"})-[:R {id: 4}]->(n3 {id: "D"}),
({id: "F"})<-[:R]-(n3)-[:R]->({id: "E"}),
(n15)-[:R]->({id: "G"})-[:R]->({id: "H"})
:param edgeIds => [2, 4]
+----------------------------------------------+
| edgeId | nodes |
+----------------------------------------------+
| 2 | ["B", "C", "D", "G", "F", "E", "H"] |
| 4 | ["D", "F", "E"] |
+----------------------------------------------+
The second example:
CREATE (n10 {id: "B"})<-[:R {id: 2}]-({id: "M"})-[:R {id: 1}]->
({id: "A"})-[:R]->(n11 {id: "C"})<-[:R]-(n10),
(n11)-[:R {id: 5}]->(n12 {id: "D"})-[:R]->({id: "E"}),
(n12)-[:R]->({id: "F"})
:param edgeIds => [1, 2, 5]
+-----------------------------------------+
| edgeId | nodes |
+-----------------------------------------+
| 1 | ["A", "C", "B", "D", "E", "F"] |
| 2 | ["B", "C", "A", "D", "E", "F"] |
| 5 | ["D", "E", "F"] |
+-----------------------------------------+
The third example:
CREATE (n18 {id: "B"})<-[:R {id: 2}]-({id: "M"})-[:R {id: 1}]->
({id: "A"})-[:R {id: 3}]->(n19 {id: "C"})<-[:R {id: 4}]-(n18),
(n19)-[:R {id: 5}]->({id: "D"})-[:R {id: 7}]->({id: "F"})<-[:R {id: 8}]-
({id: "E"})<-[:R {id: 6}]-(n19);
:param edgeIds => [1, 5]
+-----------------------------------------+
| edgeId | nodes |
+-----------------------------------------+
| 1 | ["A", "C", "D", "E", "B", "F"] |
| 5 | ["D", "F", "E"] |
+-----------------------------------------+