algorithmperformanceneo4jgraph-theorygraph-traversal

For given edge, find all nodes the edge lies on any path from


I have undirected connected graph with one main node (denoted by M in examples below).

I am trying to find efficient algorithm to following specification:

Example 1:

example 1

Expected output: {2: [B,C,D,E,F,G,H], 4: [D,E,F]}

Explanation: for all nodes from B to the right the path must go through edge 2. However the nodes A, B and C are not under edge 4 because no path between any of them and M contains edge 4.

Example 2:

example 2

Expected output: {1: [A,B,C,D,E,F], 2: [A,B,C,D,E,F], 5: [D,E,F]}

Explanation: both of nodes A and B must belong to both of edges 1 and 2 because node M is reachable from each of them (besides the shortest path) also through node C. Path length does not matter. Like in example 1, none of nodes in the cycle belongs to edge 5.

Example 3: (UPDATE - counterexample for @user58697's comment)

example 3

Expected output: {1: [A,B,C,D,E,F], 5: [D,E,F]}

Explanation: for every node A..F there exists path through edge 1 to M (for example BCAM, FECAM). However none of nodes A, B, C can reach M through edge 5 so they do not belong to it in result map (ACDFECBM is a trail, not path).

Straightforward solution

The dummy algorithm works correctly (I show pseudocode because I have implemented it in Neo4j Traversal framework actually):

Map<Edge,Set<Node>> execute(Set<Edge> input) {
    Map<Edge,Set<Node>> result = new HashMap<>();
    Node m = graph.getMainNode();
    for (Node n : graph.getAllNodes()) {
        for (Path p : depthFirstSearchAllUniquePathsFrom(n, m)) {
            for (Edge e : p.getEdges().filter(input::contains).toList()) {
                result.computeIfAbsent(e, new HashSet<>()).add(n);
            }
        }
    }
    return result;
}

The problem

The algorithm is slow. The traversal of graph of 758 nodes and 133 input edges with many cycles takes 20min on my local computer. Debugging proves there is a lot of similar paths (the depthFirstSearchAllUniquePathsFrom is actually traversal with Uniqueness.NODE_PATH which is very time consuming).

I seek for efficient algorithm which would better utilize information gained during traversal (for comparison - something like if Floyd-Warshall algorithm finds all paths with better complexity than would multiple calls of Dijkstra algorithm between each pair do).

I actually don't want to enumerate paths (which has exponential of factorial complexity). I cannot distinguish whether I am falling into trap of some NP-hard problem or there is just some simple solution I am overlooking.

Notes


Solution

  • UPDATE

    Following the update to the explanation in the question, the following query will return a collection of all distinct nodes for each relationship ID for which a path exists that:

    UNWIND $edgeIds AS edgeId
    MATCH p = SHORTEST 1 (m {id: "M"})--*()-[r {id: edgeId}]-()--*(s)
    WHERE size(nodes(p)) = COUNT { UNWIND nodes(p) AS n RETURN DISTINCT n }
    RETURN edgeId, collect(s.id) AS nodes
    

    To show it works, I made the values "M", 2 etc values of property id of the nodes and relationships for my examples.

    Here is the data and results for the first example:

    CREATE ({id: "M"})-[:R]->({id: "A"})-[:R {id: 2}]->({id: "B"})-[:R {id: 3}]->
           (n15 {id: "C"})-[:R {id: 4}]->(n3 {id: "D"}),
           ({id: "F"})<-[:R]-(n3)-[:R]->({id: "E"}),
           (n15)-[:R]->({id: "G"})-[:R]->({id: "H"})
    
    :param edgeIds => [2, 4]
    
    +----------------------------------------------+
    | edgeId | nodes                               |
    +----------------------------------------------+
    | 2      | ["B", "C", "D", "G", "F", "E", "H"] |
    | 4      | ["D", "F", "E"]                     |
    +----------------------------------------------+
    

    The second example:

    CREATE (n10 {id: "B"})<-[:R {id: 2}]-({id: "M"})-[:R {id: 1}]->
           ({id: "A"})-[:R]->(n11 {id: "C"})<-[:R]-(n10),
           (n11)-[:R {id: 5}]->(n12 {id: "D"})-[:R]->({id: "E"}),
           (n12)-[:R]->({id: "F"})
    
    :param edgeIds => [1, 2, 5]
    
    +-----------------------------------------+
    | edgeId | nodes                          |
    +-----------------------------------------+
    | 1      | ["A", "C", "B", "D", "E", "F"] |
    | 2      | ["B", "C", "A", "D", "E", "F"] |
    | 5      | ["D", "E", "F"]                |
    +-----------------------------------------+
    

    The third example:

    CREATE (n18 {id: "B"})<-[:R {id: 2}]-({id: "M"})-[:R {id: 1}]->
           ({id: "A"})-[:R {id: 3}]->(n19 {id: "C"})<-[:R {id: 4}]-(n18),
           (n19)-[:R {id: 5}]->({id: "D"})-[:R {id: 7}]->({id: "F"})<-[:R {id: 8}]-
           ({id: "E"})<-[:R {id: 6}]-(n19);
    
    :param edgeIds => [1, 5]
    
    +-----------------------------------------+
    | edgeId | nodes                          |
    +-----------------------------------------+
    | 1      | ["A", "C", "D", "E", "B", "F"] |
    | 5      | ["D", "F", "E"]                |
    +-----------------------------------------+